perm filename CLNTLN.MSG[COM,LSP]14 blob
sn#861897 filedate 1988-10-07 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00001 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 ENDMK
C⊗;
∂17-Dec-87 1712 CL-Characters-mailer test
Received: from IBM.COM by SAIL.STANFORD.EDU with TCP; 17 Dec 87 17:12:33 PST
Date: Thu, 17 Dec 87 11:35:57 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <871217.113557.baggins@IBM.com>
Subject: test
test of new router name
∂17-Dec-87 1809 CL-Characters-mailer mailbox name change, JEIDA interaction, sub-topics
Received: from IBM.COM by SAIL.STANFORD.EDU with TCP; 17 Dec 87 18:09:01 PST
Date: Thu, 17 Dec 87 17:46:49 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <871217.174649.baggins@IBM.com>
Subject: mailbox name change, JEIDA interaction, sub-topics
Subject: new mailbox router is now operational
As evidenced by the rejected message below,
cl-natural-languages is no more. please use cl-characters.
Regards,
Thom
------------------------------------------------------------
Date: 17 Dec 87 10:59:48
From: Mailer-Daemon at IBM.COM
To: BAGGINS
IBM.COM Mail Server unable to deliver the following mail to:cl-natural-languages
Reason:
Negative reply from Host:sail.stanford.edu
550 I don't know anybody named cl-natural-languages
** Text of Mail follows **
Date: Thu, 17 Dec 87 10:31:39 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee"
<cl-natural-languages@sail.stanford.edu>
Message-ID: <871217.103139.baggins@IBM.com>
Subject: mailbox name change, JEIDA interaction, sub-topics
Sometime soon, our router at stanford will change to cl-characters.
I'll broadcast as soon as I determine it is operational.
My counterpart at the IBM Tokyo Research Lab, presented the IBM
character extensions proposal at a JEIDA meeting in Nov. JEIDA knows
that this has not yet been discussed by our ANSI committee.
Per our discussion at the Ft Collins meeting, I am inviting ISO&JEIDA
to join our conferencing (via the stanford router as soon as the
new name is in effect).
Larry made the reasonable suggestion that we decide
on the sub-topics of the proposals and deal with each (initially)
somewhat independently.
Hopefully, everyone has a copy of the proposal material by now!
Let me know if not and I will ship a copy asap.
My stab at sub-topics is:
Type hierarchy
eg. thin-string
Explicit character set manipulation
eg. define-char-set
Equivalence
eg. define-equivalence-class
I/O interface
eg. print-width
Character set (or subset) predicates
eg. jcl:jis-char-p
?other suggestions?
Happy Holidays,
Thom
∂21-Dec-87 1918 CL-Characters-mailer Network communications
Received: from IBM.COM by SAIL.STANFORD.EDU with TCP; 21 Dec 87 10:59:00 PST
Date: Mon, 21 Dec 87 10:13:40 PST
From: Thom Linden <baggins@ibm.com>
To: "Dr. Takayasu Ito" <tito%aoba.aoba.tohoku.junet@relay.cs.net>,
"Dr. Taiichi Yuasa" <yuasa%kurims.kurims.kyoto-u.junet@relay.cs.net>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <871221.101340.baggins@IBM.com>
Subject: Network communications
The ANSI subcommittee handling character issues communicates
over the networks via a broadcast node (cl-characters) at Stanford.
You and/or the interested members of your committees are encouraged
to participate in these conversations. If you inform me of the
appropriate net ids, I will have them added to the distribution
list.
Regards,
Thom Linden
∂22-Dec-87 0600 CL-Characters-mailer Type hierarchy
Received: from XEROX.COM by SAIL.STANFORD.EDU with TCP; 22 Dec 87 06:00:07 PST
Received: from Cabernet.ms by ArpaGateway.ms ; 22 DEC 87 06:01:12 PST
Date: 22 Dec 87 05:59 PST
From: Masinter.pa@Xerox.COM
Subject: Type hierarchy
In-reply-to: Thom Linden <baggins@ibm.com>'s message of Thu, 17 Dec 87 17:46:49
PST
To: cl-characters@sail.stanford.edu
Message-ID: <871222-060112-6764@Xerox>
I've spent some time thinking about this:
I think it is a fundamental error, an unacceptable incompatible change, to
change the Common Lisp type STRING to be something other than (VECTOR
STRING-CHAR), as is suggested by all of the extant proposals.
I think one of our fundamental design goals is that the extended language
features being proposed be in fact extensions, in that current CL functions not
be in error.
Currently, you can assume after (TYPEP x 'STRING) that X can hold any
STRING-CHAR element. Allowing STRING to denote several different types of vector
whose element types are < STRING-CHAR would violate that assumption.
It isn't necessary to change STRING in an incompatible way, however. What is
really the intent of these proposals is to extend the various functions in CL
that currently take "STRING" to also allow them to take other types as well.
Suppose we define a new type
(defun character-vector-p (x)
(and (vectorp x) (subtypep (array-element-type x) 'string-char)))
(deftype character-vector () '(satisfies character-vector-p))..
Now extend all functions that take strings as input arguments and have them
accept any kind of character-vector.
∂29-Dec-87 1449 CL-Characters-mailer Type hierarchy
Received: from SCRC-RIVERSIDE.ARPA by SAIL.STANFORD.EDU with TCP; 29 Dec 87 14:49:26 PST
Received: from LM1.NSC.DIALNET.SYMBOLICS.COM by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 214218; 29 Dec 87 12:52:39 EST
Received: from LM2.NSC.Dialnet.Symbolics.COM by LM1.NSC.Dialnet.Symbolics.COM via CHAOS with CHAOS-MAIL id 19079; Tue 29-Dec-87 23:37:20 JST
Date: Tue, 29 Dec 87 23:37 JST
From: Carl Hoffman <CWH@LM1.NSC.Dialnet.Symbolics.COM>
Subject: Type hierarchy
To: Masinter.pa@Xerox.COM, CL-Characters@SAIL.Stanford.EDU
cc: Shiota@LM1.NSC.Dialnet.Symbolics.COM
In-Reply-To: <871222-060112-6764@Xerox>
Message-ID: <871229233714.3.CWH@LM2.NSC.Dialnet.Symbolics.COM>
Date: 22 Dec 87 05:59 PST
From: Masinter.pa@Xerox.COM
I think it is a fundamental error, an unacceptable incompatible change, to
change the Common Lisp type STRING to be something other than (VECTOR
STRING-CHAR), as is suggested by all of the extant proposals.
Why do you feel that this is a fundamental error? In the Symbolics Genera 7.1
implementation, the type STRING is the same as (OR (VECTOR STRING-CHAR) (VECTOR
CHARACTER)). As far as I can tell, this hasn't caused a major compatibility
problem. The CL programs I've seen which use strings have all run in the
Symbolics implementation without modification.
The Symbolics implementation returns the following results:
(TYPEP (MAKE-ARRAY 1 :ELEMENT-TYPE 'CHARACTER) '(VECTOR STRING-CHAR)) -> NIL
(TYPEP (MAKE-ARRAY 1 :ELEMENT-TYPE 'STRING-CHAR) '(VECTOR CHARACTER)) -> NIL
(TYPEP (MAKE-ARRAY 1 :ELEMENT-TYPE 'CHARACTER) 'STRING) -> T
(STRINGP (MAKE-ARRAY 1 :ELEMENT-TYPE 'CHARACTER)) -> T
(TYPEP (MAKE-ARRAY 1 :ELEMENT-TYPE 'STRING-CHAR) 'STRING) -> T
(STRINGP (MAKE-ARRAY 1 :ELEMENT-TYPE 'STRING-CHAR)) -> T
MAKE-ARRAY ELEMENT-TYPE 'STRING-CHAR returns an array which allocates 8 bits
per character. MAKE-ARRAY ELEMENT-TYPE 'CHARACTER returns an array which
allocates 28 bits per character (16 bits of code, 8 bits of font, and 4 bits of
modifier).
I believe that the current plan is to change MAKE-ARRAY ELEMENT-TYPE
'STRING-CHAR to return an array which allocates 16 bits per character (for 16
bits of code) and to use MAKE-ARRAY ELEMENT-TYPE 'STANDARD-CHAR to do what is
currently done with MAKE-ARRAY ELEMENT-TYPE 'STRING-CHAR.
Incidentally, I haven't heard any discussion of Moon's proposal that we simply
use the type STANDARD-CHAR to mean "lowest overhead character storage class"
rather than introducing a new type THIN-CHAR or INTERNAL-THIN-CHAR.
Currently, you can assume after (TYPEP x 'STRING) that X can hold any
STRING-CHAR element. Allowing STRING to denote several different types of vector
whose element types are < STRING-CHAR would violate that assumption.
Why not just declare that assumption obsolete, and replace it with the
assumption that if (TYPEP X '(VECTOR STRING-CHAR)) then X can hold any
STRING-CHAR element. Can you give me some examples of code which make use of
your assumption?
It isn't necessary to change STRING in an incompatible way, however. What is
really the intent of these proposals is to extend the various functions in CL
that currently take "STRING" to also allow them to take other types as well.
That is only part of the intent. It is also important that the following
forms return T. (Assume that # represents a Japanese character.)
(STRINGP "#")
(TYPEP "#" 'STRING)
(TYPEP (CHAR "#" 0) 'STRING-CHAR)
If the above forms do not return T, then many CL programs originally written to
handle only standard characters will not work when running in an environment
which has Japanese characters. A major goal of this proposal is to allow these
programs to run without modification. I can show you many programs which
require that the above forms return T.
Suppose we define a new type
(defun character-vector-p (x)
(and (vectorp x) (subtypep (array-element-type x) 'string-char)))
(deftype character-vector () '(satisfies character-vector-p))..
Now extend all functions that take strings as input arguments and have them
accept any kind of character-vector.
If you replace STRING-CHAR in your example with CHARACTER, then this is exactly
the same as what Symbolics has already done with the STRINGP function and the
STRING data type.
∂06-Jan-88 2217 CL-Characters-mailer Re: Type hierarchy
Received: from XEROX.COM by SAIL.STANFORD.EDU with TCP; 6 Jan 88 22:17:01 PST
Received: from Cabernet.ms by ArpaGateway.ms ; 06 JAN 88 22:17:43 PST
Date: 6 Jan 88 22:16 PST
From: Masinter.pa@Xerox.COM
Subject: Re: Type hierarchy
In-reply-to: Carl Hoffman <CWH@LM1.NSC.Dialnet.Symbolics.COM>'s message of Tue,
29 Dec 87 23:37 JST
To: CWH@LM1.NSC.Dialnet.Symbolics.COM
cc: Masinter.pa@Xerox.COM, CL-Characters@SAIL.Stanford.EDU,
Shiota@LM1.NSC.Dialnet.Symbolics.COM
Message-ID: <880106-221743-6432@Xerox>
I've composed several replies and not sent them. My time is getting tight so I
have to send something. The problem is, can you have something that is a string
for which it is illegal to store a string-char into it? No, in SCL. But if you
allow (vector standard-char) to also be a subtype of string, then you can have
vectors that can only hold standard-char and not string-char.
However, on even further reflection, there are many "read-only" strings, e.g.,
strings as program constants, for which it is an error to store *anything*.
If we remove char-bits and char-font, we can get rid of the distinction between
string-char and character. This would be an improvement.
Most of the stuff in CLtL about the string type can in fact simply be removed,
while simplifying the language.
∂07-Jan-88 2036 CL-Characters-mailer X3J13 meeting in March
Received: from IBM.COM by SAIL.STANFORD.EDU with TCP; 7 Jan 88 20:36:20 PST
Date: Wed, 06 Jan 88 09:58:06 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880106.095806.baggins@IBM.com>
Subject: X3J13 meeting in March
I have arranged for our subcommittee to meet at the IBM Almaden
Research Centre on 14,15,18 March. Please let me know if this
poses any difficulties. Also, please let me know if your travel
arrangements or other commitments prevent your attending all or
part.
ARC is south of Palo Alto, roughly a 40 to 50 min commute.
I would suggest our meetings begin at 10am to allow missing most
of the morning freeway congestion. I'll provide more detailed
directions later.
Regards,
Thom
∂07-Jan-88 2036 CL-Characters-mailer subcommittee mailing list
Received: from IBM.COM by SAIL.STANFORD.EDU with TCP; 7 Jan 88 20:36:05 PST
Date: Tue, 05 Jan 88 21:42:40 PST
From: Thom Linden <baggins@ibm.com>
To: "Richard P. Gabriel" <rpg@sail.stanford.edu>
cc: "Dr. Takayasu Ito" <ito%ito.aoba.tohoku.junet@relay.cs.net>,
"X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880105.214240.baggins@IBM.com>
Subject: subcommittee mailing list
Dick,
Please add the following individuals to the character subcommittee
mailing list:
Yuasa: yuasa@tutics.tut.junet
Umemura: umemura@nuesun.NTT.junet
Kurokawa: KUROKAWA%jpntscvm.bitnet%wiscvm.wisc.edu
Yasumura: yasumura@harl86.harl.hitachi.junet
Regards,
Thom
∂07-Jan-88 2036 CL-Characters-mailer Comments on IBM Proposal from Dave Unitas (LUCID)
Received: from IBM.COM by SAIL.STANFORD.EDU with TCP; 7 Jan 88 20:36:41 PST
Date: Wed, 06 Jan 88 12:50:38 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880106.125038.baggins@IBM.com>
Subject: Comments on IBM Proposal from Dave Unitas (LUCID)
I have attached some comments on the proposal compiled by
Dave Unitas at LUCID.
Both A and C seem to be good suggestions.
--------------------------------------------------------------------------------
A. Each character set is identified by its Character Set Name, a symbol,
and an associated Character Set Number, a positive integer. (Replace
CSID by Character Set Name or Character Set Number throughout the
document).
Replace char-split and char-join with:
char-code-point char-code
takes a character code and returns the component code-point.
char-code-set char-code
takes a character code and returns the component character set.
make-char-code code-point &optional (character-set 0)
takes a code-point and an optional character set and returns the
character code. The character set may be specified either as a
Character Set Name of Character Set Number.
Rename define-char-set to be define-character-set. Make the arguments
keywords rather than positionals. If character-set-number is not
specified, it is assigned from an available character set number
below character-set-limit.
Note: Lucid as a whole is as yet undecided about whether user-
defined character sets are generally useful enough to need to be
included in the language.
B. We are still unsure about whether the type system should be extended
to include extended strings of a particular character set or sets.
C. When printing an exted character set to a stream which only accepts
base characters, it is printed in the form
#\name:xxxx
where name identifies the character set of the character, and xxxx
is the code-point of the character in hex. Strings containing
extended characters are printed in the following form when written
to a base-character only stream:
#( char0 char1 char2 ...)
with charn as above, following the standard Common Lisp vector
printing convention.
∂10-Jan-88 0010 CL-Characters-mailer Re: Comments on IBM Proposal from Dave Unitas (LUCID)
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 10 Jan 88 00:10:01 PST
Received: from Cabernet.ms by ArpaGateway.ms ; 10 JAN 88 00:10:39 PST
Date: 10 Jan 88 00:09 PST
From: Masinter.pa@Xerox.COM
Subject: Re: Comments on IBM Proposal from Dave Unitas (LUCID)
In-reply-to: Thom Linden <baggins@ibm.com>'s message of Wed, 06 Jan 88 12:50:38
PST
To: baggins@ibm.com
cc: cl-characters@sail.stanford.edu
Message-ID: <880110-001039-3219@Xerox>
I think one of the problems with the discussion so far is that we've not agreed
really on the fundamental issue of whether the standard is for an optional
extension or for a required part of the standard.
For the record, I think that we should be designing things that are a required
part of every Common Lisp implementation. That is, every function, variable,
etc. in our standard should be in every Common Lisp implementation. In some
implementations, the characters they work on are of course only 7 or 8-bit
ASCII, but all of the functions are there, and if the implementation has more
characters or Japanese characters, the same code will work.
If this is a required part of Common Lisp, we should try to keep to a minimum
the number of new functions, variables, and behaviors we expect from a Common
Lisp implementation.
I don't think that the introduction of new functions and variables for dealing
with character sets really fits that criteria. The only situation where allowing
exposure to multiple character sets within a single implementation makes sense
is one in which the host operating system does not contain facilities to do
character set translation, and yet the programmer is unwilling (using binary
read-byte write-byte) to do that character set translation directly. This seems
like an extremely narrow application domain for the dozens of functions and
variables which exist in the IBM proposal.
= = = =
As a side note, the IBM proposal contains a fairly serious design flaw: the
Common Lisp design is generally careful to avoid having dynamically modifiable
global state that isn't rebindable; e.g., although you can change macro
characters, all changes happen to *readtable*, etc. Yet the character code
equivalency tables in the IBM proposal are global and not yet bindable. Even if
this isn't part of the standard but an internal library for you, you should fix
it.
= = = = = =
About the type system: the discussion on Common-LIsp@sail.stanford.edu on array
element type upgrading is relevant to the type hierarchy here. Suppose arrays
remember their element type. Redefine (stringp x) = (and (vectorp x) (subtypep
(array-element-type x) 'character)).
If you want to make a string that consists of only (capital) vowels,, you can
say
(make-array 10 :element-type '(member #\A #\E #\I #\O #\U)).
= = = = = = =
Re: "C. When printing an exted character set to a stream which only accepts
base characters, it is printed in the form ... Strings containing
extended characters are printed in the following form when written
to a base-character only stream ..."
how are symbols that contain extended characters printed?
What happens when you call PRINC (which is supposed to not include the #\)?
I think this is a bad design. If you want to write extended characters on a base
stream, you should design a character-by-character encoding with escape
characters, and have the write-char primitive for the base stream turn the
extended characters (and the excape) into an escaped character sequence. These
alternative print sequences only handle a small percentage of the situations.
∂22-Jan-88 0005 CL-Characters-mailer Equivalence binding
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Jan 88 00:05:45 PST
Date: Thu, 21 Jan 88 23:56:14 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880121.235614.baggins@IBM.com>
Subject: Equivalence binding
Larry's comment on the binding of equivalency tables is well taken.
Our view of the expected usage of these tables plus trying to keep
the proposed changes to a minimum argued against bindable tables.
Language consistency argues the other way. The introduction of
an equivalencetable object and associated global *equivalencetable*
variable would make this more in line with the 'spirit' of CL.
∂22-Jan-88 0137 CL-Characters-mailer redefining STANDARD-CHAR
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Jan 88 01:37:17 PST
Date: Fri, 22 Jan 88 01:31:08 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880122.013108.baggins@IBM.com>
Subject: redefining STANDARD-CHAR
Carl's comment on STANDARD-CHAR == lowest overhead character
storage class is precisely what 'base-character' was defined to
be. The rational for STANDARD-CHAR being the small set of 96
glyphs is based on portability. Programs constrained to the
limited set are likely to be portable across a larger range of
systems and architectures. While this is probably true (can
anyone testify to this?), it may not warrant a unique type.
Other languages typically define a set of 'standard' characters
used for the construction of programs. Does anyone know of a language
other than Lisp which equates this set with a unique type?
I think distinguishing this 'lowest overhead storage class'
type is essential. This must be made for efficiency reasons.
It's unacceptable to force the use of 16bit cells for all
characters in multi-lingual environments.
∂22-Jan-88 0202 CL-Characters-mailer Type Hierarchies
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Jan 88 02:01:50 PST
Date: Fri, 22 Jan 88 01:52:46 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880122.015246.baggins@IBM.com>
Subject: Type Hierarchies
No one has mentioned Bob Kern's document. The type hierarchies
in the JEIDA, IBM and Kern documents are essentially identical
(excepting thin vs. base, fat vs. extended, and Bob's user-extensions).
Bob makes a valid point that the two-byte encodings may make way
for three, etc. later. But, it seems best to hide that from the
language as much as possible. I suggest that extended would always
mean the 'largest overhead character storage class'.
∂22-Jan-88 0224 CL-Characters-mailer Font
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Jan 88 02:24:34 PST
Date: Fri, 22 Jan 88 02:20:47 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880122.022047.baggins@IBM.com>
Subject: Font
Bob Kerns paper contains a set of changes to eliminate char-font
and allows for some migratory behavior. I think the [migration]
aids not be made part of the standard but be suggestions as bridges
an implementation may provide. I would like to get a straw vote
over the network as to everyone else's opinion?
In summary: (I, not Bob, marked items [migration])
13.1 Character Attributes
{eliminate references to font}
[migration] char-font-limit
The value of char-font-limit is 1, unless the
implementation implements the obsolete char-font
feature.
13.2 Predicates on Characters
{eliminate references to font}
13.3 Character Construction and Selection
{eliminate references to font}
[migration] char-font
This function is obsolete, and returns 0 for
compatibility.
[migration] make-char char &optional (bits o) (font o)
(font o) exists for compatibility.
13.3 Character Construction and Selection
{eliminate references to font}
[migration] digit-char weight &optional (radix 10) (font o)
(font o) exists for compatibility.
∂22-Jan-88 0234 CL-Characters-mailer character set predicates
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Jan 88 02:34:12 PST
Date: Fri, 22 Jan 88 02:28:18 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880122.022818.baggins@IBM.com>
Subject: character set predicates
Larry suggested to me that we not try to invent
correct set of xxx-char-p's
eg. kanji-char-p, hiragana-char-p, greek-char-p .. etc. but
instead use the names listed in the ISO std character sets.
This sounds like a good idea .. now we only have to find the
list. In fact, I imagine we can reference the ISO std without
having to incorporate the list into ANSI.
∂26-Jan-88 1928 CL-Characters-mailer Font
Received: from REAGAN.AI.MIT.EDU by SAIL.Stanford.EDU with TCP; 26 Jan 88 19:28:16 PST
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 88658; Tue 26-Jan-88 22:28:00 EST
Date: Tue, 26 Jan 88 22:27 EST
From: Robert W. Kerns <RWK@AI.AI.MIT.EDU>
Subject: Font
To: baggins@ibm.com
cc: cl-characters@sail.stanford.edu
In-Reply-To: <880122.022047.baggins@IBM.com>
Message-ID: <880126222758.5.RWK@JONES.AI.MIT.EDU>
Date: Fri, 22 Jan 88 02:20:47 PST
From: Thom Linden <baggins@ibm.com>
Bob Kerns paper contains a set of changes to eliminate char-font
and allows for some migratory behavior. I think the [migration]
aids not be made part of the standard but be suggestions as bridges
an implementation may provide. I would like to get a straw vote
over the network as to everyone else's opinion?
This seems reasonable to me. So far as anyone can tell, nobody
has ever implemented the Font field. (I haven't checked with
Coral Software to see what they do on the Macintosh; that would seem
to me to be the place most likely to have done so. I'll check with
them shortly.)
∂26-Jan-88 1942 CL-Characters-mailer Type Hierarchies
Received: from REAGAN.AI.MIT.EDU by SAIL.Stanford.EDU with TCP; 26 Jan 88 19:42:10 PST
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 88661; Tue 26-Jan-88 22:42:01 EST
Date: Tue, 26 Jan 88 22:41 EST
From: Robert W. Kerns <RWK@AI.AI.MIT.EDU>
Subject: Type Hierarchies
To: baggins@ibm.com
cc: cl-characters@sail.stanford.edu
In-Reply-To: <880122.015246.baggins@IBM.com>
Message-ID: <880126224159.6.RWK@JONES.AI.MIT.EDU>
Date: Fri, 22 Jan 88 01:52:46 PST
From: Thom Linden <baggins@ibm.com>
No one has mentioned Bob Kern's document. The type hierarchies
in the JEIDA, IBM and Kern documents are essentially identical
(excepting thin vs. base, fat vs. extended, and Bob's user-extensions).
Bob makes a valid point that the two-byte encodings may make way
for three, etc. later. But, it seems best to hide that from the
language as much as possible. I suggest that extended would always
mean the 'largest overhead character storage class'.
The issue here is: What should existing code, written using STRING and
STRING-CHAR mean? Should code written in the most general current
fashion continue to mean the most general thing? Or should it mean the
most efficient?
The assumption behind my proposal is that it should mean the most general,
and if you want a more specific, but more space-efficient, type, you use
a new name.
So far as Symbolics is concerned, having STRING-CHAR mean a more specific
type would be LESS of a problem, since in the current Symbolics software,
STRING-CHAR means the 1-byte kind of characters.
The trade-off, in terms of users' code, would be:
1) If STRING-CHAR is more general, users' code will get less efficient when
an implementation implements the new standard, but will work for all input.
2) If STRING-CHAR is more specific, users' code will retain their efficiency,
but may no longer work for the entire range of input found, say, in files or
other strings.
Whether case 2 would be viewed as an incompatibility or not depends on the
exact contract for the code in question. For example, a file copy or string
utility would definitely be regarded as having been broken by the change,
while other code might be regarded as just not taking advantage of a new feature.
By the way, I should make my position in this clear. I am no longer
affiliated with Symbolics. While my opinions and views are probably
indicative of views there, and I have some influence and many contacts
there, I have no official connection, and my views are my own. I
continue to be concerned with conventional as well as specialized
architectures, though.
∂04-Feb-88 0020 CL-Characters-mailer Forwarding note from Ito-san
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 4 Feb 88 00:19:52 PST
Date: Wed, 03 Feb 88 10:49:00 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880203.104900.baggins@IBM.com>
Subject: Forwarding note from Ito-san
-------------------------------------------------------------
To: Thom Linden, Chairman of character subcommittee, Common Lisp, ANSI
From: Takayasu Ito, Chairman of Japanese SC22/Lisp WG and JEIDA
committee on Lisp standardization
Subject: Comments on IBM Proposal "Common LISP - Proposed Extensions
for International Character Set Handling" (Version 01.11.87)
We have received the proposal through Mr. T. Kurokawa, P-member of our
committee. Here is the summary of our comments compiled by him and
Dr. T. Yuasa. (More details may be obtained from them.)
1. Overall impression
We think this is an interesting proposal for initiating extensive
investigation about international character set handling. We need,
however, to continue to work on many aspects on this area.
2. We have had several meetings on this subject. The following is a list
of comments presented at these occasions.
Please notify that these are not yet our committee's formal statement.
-- The locality of 'equivalence class' must be maintained as suggested
by Larry Masinter. A variable such as *equivalence-class* would do.
-- It is important to define the 'base' character set.
It is still under hot dispute, but one argues, for example, that
the base should be clearly defined as single-byte, and the extension
should be defined by each national standardization body.
Another says, in Japan, it
should be two-byte size or its maximum should be around 64K. We should
leave the actual implementation of the character to be
implementation-dependent so that US or Europe can enjoy the efficient
implementation of single byte size.
-- The implementation of 'equivalence class table' will be the key for
the efficiency.
-- The relationship between 'equivalence class' and 'readtable' or
'character macro' should be investigated further. We may be able to
reduce the primitives around these character input facilities.
-- The proposal may not well be abstracted. For those who have enough
experience on Common Lisp implementation, the document has so much
reflected from real (perhaps trial) implementation.
For example, csid for base is defined as '0' for US implementation.
3. Cooperation should be continued.
We regard that our cooperation for international character set handling
is indispensable and fruitful. We would like to continue to exchange our
ideas on this subject.
∂04-Feb-88 1732 CL-Characters-mailer Re: X3J13 meeting in March
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 4 Feb 88 17:32:05 PST
Received: from Cabernet.ms by ArpaGateway.ms ; 04 FEB 88 17:32:11 PST
Date: 4 Feb 88 17:32 PST
From: Masinter.pa@Xerox.COM
Subject: Re: X3J13 meeting in March
In-reply-to: Thom Linden <baggins@ibm.com>'s message of Wed, 06 Jan 88 09:58:06
PST
To: baggins@ibm.com
cc: cl-characters@sail.stanford.edu
Message-ID: <880204-173211-1471@Xerox>
Thom:
For all of those who are staying in Palo Alto for the duration of the meeting,
adding the 40-50 minute commute each way (for a total of 4.5 hours of commute
time) seems to be a considerable imposition.
It would seem to pose much fewer difficulties for almost all of the subcommittee
members to hold the meetings in Palo Alto, since that is where the X3J13 meeting
is being held.
Jan Zubkoff has offered to arrange meeting rooms in Palo Alto for subcommittee
meetings; why not take her up on the offer?
I've been on the road and just returned; I'm sorry for my late reply to this
message.
It is likely that the cleanup committee will meet on Tuesday morning 15 March,
which would interfere with my attending a meeting in Palo Alto until 1 PM and at
ARC until 2 PM.
∂08-Feb-88 1143 CL-Characters-mailer subcommittee meeting
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 8 Feb 88 11:39:33 PST
Date: Mon, 08 Feb 88 11:22:20 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880208.112220.baggins@IBM.com>
Subject: subcommittee meeting
Larry has expressed interest in holding the subcommittee meetings
in Palo Alto to ease the commute. What is the feeling of the
rest of the committee? Please answer the following short
questionaire:
I am planning on attending the March meeting: YES/NO
Subcommittee meeting at Almaden (San Jose) is OK: YES/NO/DONTCARE
I will be available to attend subcommittee meetings from:
Date Hours
14 Mar 9-4pm
15 Mar 9-4pm
18 Mar 9-4pm
Please respond by 11 Feb so I can make alternate arrangements if
necessary.
Regards,
Thom
∂08-Feb-88 1818 CL-Characters-mailer subcommittee meeting
Received: from AI.AI.MIT.EDU by SAIL.Stanford.EDU with TCP; 8 Feb 88 18:17:51 PST
Date: Mon, 8 Feb 88 21:18:15 EST
From: "Robert W. Kerns" <RWK@AI.AI.MIT.EDU>
Subject: subcommittee meeting
To: baggins@IBM.COM
cc: cl-characters@SAIL.STANFORD.EDU
In-reply-to: Msg of Mon 08 Feb 88 11:22:20 PST from Thom Linden <baggins at ibm.com>
Message-ID: <323585.880208.RWK@AI.AI.MIT.EDU>
Date: Mon, 08 Feb 88 11:22:20 PST
From: Thom Linden <baggins at ibm.com>
To: X3J13: Character Subcommittee <cl-characters at sail.stanford.edu>
Re: subcommittee meeting
Larry has expressed interest in holding the subcommittee meetings
in Palo Alto to ease the commute. What is the feeling of the
rest of the committee? Please answer the following short
questionaire:
I am planning on attending the March meeting: YES/NO
YES
Subcommittee meeting at Almaden (San Jose) is OK: YES/NO/DONTCARE
I would prefer Palo Alto. I can handle San Jose; I do have
friends there I intend to visit, but Palo Alto would leave me
more flexibility.
I will be available to attend subcommittee meetings from:
Date Hours
14 Mar 9-4pm
15 Mar 9-4pm
18 Mar 9-4pm
Yes, so far as I know, but please, let's not consume 100%
of all three days! I'll be suspicious of any work we do at that pace.
Please respond by 11 Feb so I can make alternate arrangements if
necessary.
Regards,
Thom
∂12-Feb-88 0620 CL-Characters-mailer Font
Received: from XX.LCS.MIT.EDU by SAIL.Stanford.EDU with TCP; 12 Feb 88 06:20:19 PST
Received: from LIVE-OAK.LCS.MIT.EDU by XX.LCS.MIT.EDU via Chaosnet; 12 Feb 88 09:17-EST
Received: from ACORN.Gold-Hill.DialNet.Symbolics.COM by MIT-LIVE-OAK.DialNet.Symbolics.COM via DIAL with SMTP id 80237; 12 Feb 88 09:18:27-EST
Received: from BOSTON.Gold-Hill.DialNet.Symbolics.COM by ACORN.Gold-Hill.DialNet.Symbolics.COM via CHAOS with CHAOS-MAIL id 93908; Thu 11-Feb-88 05:30:59-EST
Date: Fri, 12 Feb 88 08:32 est
From: mike%acorn@oak.lcs.mit.edu
To: RWK@AI.AI.MIT.EDU
Subject: Font
Cc: baggins@ibm.com, cl-characters@sail.stanford.edu
Bob Kerns paper contains a set of changes to eliminate char-font
and allows for some migratory behavior. I think the [migration]
aids not be made part of the standard but be suggestions as bridges
an implementation may provide. I would like to get a straw vote
over the network as to everyone else's opinion?
This seems reasonable to me. So far as anyone can tell, nobody
has ever implemented the Font field.
We don't implement font either. I think char-font should be dropped.
char-bits is more of a problem, but I think it should be dropped too.
for compatibility we should introduce a "non-standard" migration path
type called "keychord" to represent objects like #\c-m-s-h-S, etc.
The confusion between characters as codepoints in an implicit or
explicit character set, and keyboard key combinations is one which is
incredibly useless and should go away. It is particularly troublesome
when you consider a japanese keyboard sequence, where you need
several keyboard and keychord hits to generate a character, and
bits doesn't correspond to keychords or "shifting" in any reasonable
way.
...mike beckerle
Gold Hill
∂16-Feb-88 1112 CL-Characters-mailer March meeting
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 16 Feb 88 11:12:09 PST
Date: Tue, 16 Feb 88 10:04:31 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880216.100431.baggins@IBM.com>
Subject: March meeting
In general, folks wanted to meet in the PA area. I have requested
a meeting room for Monday 14 Mar 1-4pm and Tuesday 25 Mar 9-4pm. I'll
relay the confirmation as soon as I have it.
Regards,
Thom
∂16-Feb-88 1506 CL-Characters-mailer bits and charsets
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 16 Feb 88 15:06:37 PST
Date: Tue, 16 Feb 88 12:39:18 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880216.123918.baggins@IBM.com>
Subject: bits and charsets
We don't implement font either. I think char-font should be dropped.
char-bits is more of a problem, but I think it should be dropped too.
for compatibility we should introduce a "non-standard" migration path
type called "keychord" to represent objects like #\c-m-s-h-S, etc.
The confusion between characters as codepoints in an implicit or
explicit character set, and keyboard key combinations is one which is
incredibly useless and should go away. It is particularly troublesome
when you consider a japanese keyboard sequence, where you need
several keyboard and keychord hits to generate a character, and
bits doesn't correspond to keychords or "shifting" in any reasonable
way.
In thinking about Mike's note, it occurs to me that explicit support
for character sets actually encompasses bits. An implementation
could support a character set named 'meta-cyrillic' for example, this
could contain all the cyrillic character combinations of alt,ctl, etc..
and would be distinct from the non-distinguished cyrillic characters.
Similarily this could apply to any conventional character set an
implementation would choose to support.
∂16-Feb-88 1543 CL-Characters-mailer bits and charsets
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 16 Feb 88 15:42:50 PST
Date: Tue, 16 Feb 88 12:39:18 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880216.123918.baggins@IBM.com>
Subject: bits and charsets
We don't implement font either. I think char-font should be dropped.
char-bits is more of a problem, but I think it should be dropped too.
for compatibility we should introduce a "non-standard" migration path
type called "keychord" to represent objects like #\c-m-s-h-S, etc.
The confusion between characters as codepoints in an implicit or
explicit character set, and keyboard key combinations is one which is
incredibly useless and should go away. It is particularly troublesome
when you consider a japanese keyboard sequence, where you need
several keyboard and keychord hits to generate a character, and
bits doesn't correspond to keychords or "shifting" in any reasonable
way.
In thinking about Mike's note, it occurs to me that explicit support
for character sets actually encompasses bits. An implementation
could support a character set named 'meta-cyrillic' for example, this
could contain all the cyrillic character combinations of alt,ctl, etc..
and would be distinct from the non-distinguished cyrillic characters.
Similarily this could apply to any conventional character set an
implementation would choose to support.
∂19-Feb-88 1434 CL-Characters-mailer Re: bits and charsets
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 19 Feb 88 14:34:18 PST
Received: from Cabernet.ms by ArpaGateway.ms ; 19 FEB 88 14:24:22 PST
Date: 19 Feb 88 14:24 PST
From: Masinter.pa@Xerox.COM
Subject: Re: bits and charsets
In-reply-to: Thom Linden <baggins@ibm.com>'s message of Tue, 16 Feb 88 12:39:18
PST
To: baggins@ibm.com
cc: cl-characters@sail.stanford.edu
Message-ID: <880219-142422-9739@Xerox>
Well, the most natural embedding of "bits" is just directly within the character
code space, with or without the character code equivalence space.
On the subject of character sets, I've thought of the following problem with any
kind of dynamic adjustment of character equivalence tables: hash tables which
hash by string-equal won't work if string-equal might depend either on some
dynamically changable state or even a bindable state.
∂29-Feb-88 1304 CL-Characters-mailer subcommittee meeting
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 29 Feb 88 13:04:37 PST
Date: Mon, 29 Feb 88 12:29:52 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880229.122952.baggins@IBM.com>
Subject: subcommittee meeting
The characters subcommittee will meet from 9am-5pm on both
Monday, 14 Mar, and Tuesday, 15 Mar, in the Hyatt Delmonte room.
Regards,
Thom
∂08-Mar-88 1417 CL-Characters-mailer Type Hierarchies
Received: from XX.LCS.MIT.EDU by SAIL.Stanford.EDU with TCP; 8 Mar 88 14:16:28 PST
Received: from LIVE-OAK.LCS.MIT.EDU by XX.LCS.MIT.EDU via Chaosnet; 8 Mar 88 16:53-EST
Received: from ACORN.Gold-Hill.DialNet.Symbolics.COM by MIT-LIVE-OAK.DialNet.Symbolics.COM via DIAL with SMTP id 83562; 8 Mar 88 16:48:56-EST
Received: from BOSTON.Gold-Hill.DialNet.Symbolics.COM by ACORN.Gold-Hill.DialNet.Symbolics.COM via CHAOS with CHAOS-MAIL id 96137; Tue 8-Mar-88 14:56:37-EST
Date: Tue, 8 Mar 88 14:56 est
From: mike%acorn@oak.lcs.mit.edu (mike@gold-hill.com after 1-April-88)
COMMENTS: NOTE %acorn@oak... CHANGES TO @GOLD-HILL.COM ON 1-April-88
To: RWK@AI.AI.MIT.EDU
Subject: Type Hierarchies
Cc: baggins@ibm.com, cl-characters@sail.stanford.edu
No one has mentioned Bob Kern's document. The type hierarchies
in the JEIDA, IBM and Kern documents are essentially identical
(excepting thin vs. base, fat vs. extended, and Bob's user-extensions).
Bob makes a valid point that the two-byte encodings may make way
for three, etc. later. But, it seems best to hide that from the
language as much as possible. I suggest that extended would always
mean the 'largest overhead character storage class'.
In fact. My contacts in japan assure me that more than 16 bits are needed.
The japanese consider themselves to be the guardians of oriental
interests in these matters. Korean, Mandarin, etc. all require plenty
more than just 16 bits of code. Moreover, having just a two level
hierarchy (8 bit char codes, or "extended") is egocentric, and just
shouldn't be done.
My suggestion is that characters and their types be extended like
the UNSIGNED-BYTE type:
(CHARACTER 8) (CHARACTER 16) (CHARACTER 24) (CHARACTER 32)
where (TYPEP X '(CHARACTER <n>)) means
(AND (TYPEP X 'CHARACTER)
(TYPEP (CHAR-CODE X) '(UNSIGNED-BYTE <n>))
The issue here is: What should existing code, written using STRING and
STRING-CHAR mean? Should code written in the most general current
fashion continue to mean the most general thing? Or should it mean the
most efficient?
This should be solved the same way as for floating point numbers.
*READ-DEFAULT-FLOAT-FORMAT* determines the kind of floats the reader
creats and the printer prints by default.
*DEFAULT-CHARACTER-CODE-SIZE* (pick any name) can determine the
default width.
This variable however, just affects the reader and printer and does
not authorize the compiler to do anything other than call generic
string operations.
The assumption behind my proposal is that it should mean the most general,
and if you want a more specific, but more space-efficient, type, you use
a new name.
I think it should be up to the implementation what the default value of this
parameter is. It would be nonsense to have a common lisp that is primarily
sold in japan have the default be 8 bits, and similarly nonsense for
one sold in the US to have the default be 16 bits. Most applications do
not use strings of more than one kind, although clearly many will.
So far as Symbolics is concerned, having STRING-CHAR mean a more specific
type would be LESS of a problem, since in the current Symbolics software,
STRING-CHAR means the 1-byte kind of characters.
STRING-CHAR, like CHARACTER as I've described it above, is a non-specific
type specifier, much like UNSIGNED-BYTE. Ultimately, I'd like to dump
CHAR-BITS in favor of a whole new concept which would be non-standard
generally, called KEY-CHORDS. CHAR-FONT is also out-the-window
as far as I'm concerned. Hence, I think there should be no
difference at all between STRING-CHAR and CHARACTER.
The trade-off, in terms of users' code, would be:
1) If STRING-CHAR is more general, users' code will get less efficient when
an implementation implements the new standard, but will work for all input.
Clearly, we need a global proclaimation that says all strings
contain characters of a certain width, so that one can set the reader
default, give the proclaimation, then compile, with no loss of
efficiency. How about
(PROCLAIM '(CHAR-CODE-SIZE 8))
2) If STRING-CHAR is more specific, users' code will retain their efficiency,
but may no longer work for the entire range of input found, say, in files or
other strings.
Whether case 2 would be viewed as an incompatibility or not depends on the
exact contract for the code in question. For example, a file copy or string
utility would definitely be regarded as having been broken by the change,
while other code might be regarded as just not taking advantage of a new feature.
...mike beckerle
Gold Hill
∂09-Mar-88 1230 CL-Characters-mailer subcommittee meeting
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 9 Mar 88 12:29:55 PST
Date: Wed, 09 Mar 88 12:08:12 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
cc: Jan Zubkoff <edsel!jlz@labrea.stanford.edu>
Message-ID: <880309.120812.baggins@IBM.com>
Subject: subcommittee meeting
Our meeting room has been changed from Del Monte to the Regency-2
room. Also, I have some difficulty with a 9am start and would
like to change this to 10am.
Regards,
Thom
∂09-May-88 0743 CL-Characters-mailer back from travel
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 9 May 88 07:42:49 PDT
Date: Mon, 09 May 88 07:41:46 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880509.074146.baggins@IBM.com>
Subject: back from travel
I'm back from several weeks in europe. This week I plan to draft
the changes discussed at the last PA meeting. Some notes from the
last meeting are also forthcomming.
Regards,
Thom
∂11-May-88 0833 CL-Characters-mailer june meeting
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 11 May 88 08:33:15 PDT
Date: Wed, 11 May 88 08:27:00 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880511.082700.baggins@IBM.com>
Subject: june meeting
The choices for a subcommittee meeting in June are 13, 14, 17.
I believe one day will be sufficient and have a high school graduation
the 17th. So, from my end, June 14th is the reasonable selection.
Please respond asap as to whether you can:
1) attend the main x3j13 meeting (15,16 June)
2) attend a 14 June subcommittee meeting
3) prefer a different date(s) for the subcommittee meeting
Regards,
Thom
∂16-May-88 1353 CL-Characters-mailer June meeting
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 16 May 88 13:53:41 PDT
Date: Mon, 16 May 88 13:47:16 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880516.134716.baggins@IBM.com>
Subject: June meeting
I had one request for an evening meeting on June 14 and no other
indicated preferences. I'm working on a conf room arrangements
for that evening and will post a notice as soon as I have something
solid.
I am planning on arriving on Monday evening, returning Wednesday pm
after the 1st day of x3j13.
Regards,
Thom
∂16-May-88 1615 CL-Characters-mailer June meeting
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 16 May 88 16:15:44 PDT
Date: Mon, 16 May 88 16:11:50 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880516.161150.baggins@IBM.com>
Subject: June meeting
An evening meeting room at Symbolics is apparently not possible.
Our meeting will now take place from 10-5 in the 'Bermuda' room (so bring
your tanning lotion).
I'm willing to also meet again at 7 in the hotel (at least to review) if
anyone has difficulty making the earlier time.
Regards,
Thom
========================================================================
Received: from STONY-BROOK.SCRC.Symbolics.COM by IBM.COM on 05/16/88 at 14:32:48 PDT
Received: from PEGASUS.SCRC.Symbolics.COM by STONY-BROOK.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 405455; Mon 16-May-88 17:10
Received: by scrc-pegasus id AA00374; Mon, 16 May 88 16:49:46 edt
Date: Mon, 16 May 88 16:49:46 edt
From: Rosemary Bouzane <bouzane@scrc-pegasus>
To: baggins@ibm.com
Subject: Re: subcommittee meeting
Since Symbolics is a secured building we cannot accommodate your
request for Tuesday evening. However, I switched meetings around
so that we could do the following:
The Character Committee can now meet at Symbolics in
our Bermuda Conference Room - first floor 10:00-5:00.
∂20-May-88 0053 CL-Characters-mailer june meeting
Received: from AI.AI.MIT.EDU by SAIL.Stanford.EDU with TCP; 20 May 88 00:53:17 PDT
Date: Fri, 20 May 88 03:57:48 EDT
From: "Robert W. Kerns" <RWK@AI.AI.MIT.EDU>
Subject: june meeting
To: baggins@IBM.COM
cc: cl-characters@SAIL.STANFORD.EDU
In-reply-to: Msg of Wed 11 May 88 08:27:00 PDT from Thom Linden <baggins at ibm.com>
Message-ID: <381653.880520.RWK@AI.AI.MIT.EDU>
Date: Wed, 11 May 88 08:27:00 PDT
From: Thom Linden <baggins at ibm.com>
Please respond asap as to whether you can:
1) attend the main x3j13 meeting (15,16 June)
Yes.
2) attend a 14 June subcommittee meeting
Yes.
3) prefer a different date(s) for the subcommittee meeting
I would prefer a date sometime in July for the whole mess, but...
I believe Mike Bekerly will also attend, but he's off the net and
doesn't know the dates. (Gold Hill hopes to be back on the net soon,
but they've been off for quite a while, apparently).
Regards,
Thom
∂25-Jun-88 1536 CL-Characters-mailer character proposal
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 25 Jun 88 15:36:26 PDT
Date: Sat, 25 Jun 88 15:32:26 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880625.153226.baggins@IBM.com>
Subject: character proposal
OK. I have made (I think) the changes to the proposal as discussed
in Boston. There are still a couple of points which need further
discussion or documentation:
Simple strings: currently the document only specifies
simple-base-string (simple-string is eliminated as
ambiguous).
External width: I believe this is still needed and Dick
Waters indicated some need for this type of function
at the Boston meeting. However this is still contested.
Standard # Macro Character Syntax: Is there a reasonable
convention for 'named' extended characters. Perhaps
#\character-set:index. For example #\JISxxx:234.
?? and probably others.
At this point, I would like everyone to read the proposal
in depth. There are two sections 1) the overview and 2)
the detail changes to CLtL.
Read the first section for completeness and accuracy. Note it
doesn't have to cover every detail of change but needs to
say enough to understand the overall pattern of change.
For the second section, I suggest you mark up a fresh CLtL per the
proposal. This will help verify the paragraph numbers!
Then review the entire CLtL for consistency, accuracy and completeness.
(It turned out characters hit quite a variety of places, some easy
to miss!).
In all cases, please write up changes to the proposal in a
complete manner. I'm running out of time to type LaTex.
If you are willing to completely rewrite some section,
feel free to do so (ie. don't suggest I rewrite it).
I used very few features of LaTex to create the document. I
expect they will be self explanatory. Quiz me if not.
We need to vote the document out of committee in sufficient
time to distribute electronically before the next general
meeting in October. Therefore, I'm setting the first week
in August as our final vote target. In July, we need to
discuss and vote on any sub-issues. If there is strong
opposition (or proposition) by any member on some aspect
of the proposal, we'll bring it to a vote for settlement.
Gary Palter, as Bob Kerns and Mike Beckerle seem to have
net access problems, I'm asking you to distribute copies
of the proposal to them. (Thanks in advance). Let me know
if that poses any difficulties.
Any non-US colleagues listening into this discussion,
feel free to review the document as well! Please note that
this is still a working document of the subcommittee. Thus
your comments will probably have greater impact if given
now than later.
Regards,
Thom
∂27-Jun-88 0747 CL-Characters-mailer character proposal
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 27 Jun 88 07:47:11 PDT
Date: Mon, 27 Jun 88 07:44:37 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880627.074437.baggins@IBM.com>
Subject: character proposal
Well, I arrived this morning to find the proposal returned by
the postmaster as being too big. I'll work on a circumvention
this morning.
Regards,
Thom
∂27-Jun-88 0845 CL-Characters-mailer part 1
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 27 Jun 88 08:44:27 PDT
Date: Mon, 27 Jun 88 08:38:01 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880627.083801.baggins@IBM.com>
Subject: part 1
Part one of the proposal is appended to this note.
Regards,
Thom
-----------------------------------------------------------------------
\documentstyle{report} % Specifies the document style.
\pagestyle{headings}
\title{\bf DRAFT DRAFT:
Extensions to Common LISP to Support International
Character Sets}
\author{
Michael Beckerle\thanks{Gold Hill} \and
Paul Beiser\thanks{Hewlett-Packard} \and
Carl Hoffman\thanks{ILA Associates} \and
Robert Kerns\thanks{Independent consultant} \and
Kevin Layer\thanks{Franz LISP} \and
Thom Linden\thanks{IBM Research, Subcommittee Chair} \and
Larry Masinter\thanks{XEROX Research} \and
etc
}
\date{June 24, 1988} % Deleting this command produces today's date.
\begin{document}
\maketitle % Produces the title.
\setcounter{secnumdepth}{4}
\setcounter{tocdepth}{4}
\tableofcontents
%----------------------------------------------------------------------
%----------------------------------------------------------------------
\newfont{\cltxt}{cmr10}
\newfont{\clkwd}{cmtt10}
\newcommand{\apostrophe}{\clkwd '}
\newcommand{\bq}{\clkwd\symbol{'22}}
\newcommand{\editstart}{\begin{tabbing}
12345 \= \kill %set tab1
\bf$\Rightarrow$\ddag}
\newcommand{\editend}{\end{tabbing}}
%----------------------------------------------------------------------
%----------------------------------------------------------------------
\chapter{Introduction}
This is a proposal for both extending and modifying the Common LISP
language definition to provide a standard basis for Common LISP
support of the variety of character sets used to represent the
native languages of the international community.
This proposal was created by the Character Subcommittee of X3 J13.
We would like to acknowledge the JEIDA proposal \cite{ida87}
as well as the
proposals \cite{linden87} and \cite{kerns87} for
providing the initial motivation and direction for these extensions.
As all three documents \cite{ida87,linden87,kerns87} were created
expressly for Common LISP standardization usage,
we have borrowed freely from their ideas as well as the texts
themselves.
This document is separated into two parts. The first part explains the
major language changes and their motivations. The second part,
Appendix A, provides
the page by page set of editorial changes to \cite{steele84}
\section{Objectives}
The major objectives of this proposal are:
\begin{itemize}
\item Providing a consistent, well-defined scheme allowing support
of both very large character sets and multiple character sets.
Many native
languages, such as Japanese and Chinese, use character
sets which contain more characters than the Roman alphabet.
Supporting larger sized character sets frequently means employing
larger data fields to uniquely encode each
character.
Common LISP implementations using
larger sized character sets
can
incur performance penalties in terms
of space, time, or both.
Many software applications are intended for international use, or
have requirements for incorporation of language elements of multiple
native
languages within a single application.
In order
to ensure some portability of these applications, data expressed in
a mixture of
native
languages must be treated consistently by the
software language.
\item To ensure efficient performance of string and character
operations.
The use of large and/or multiple character sets by an implementation
implies the need for more complex character type representation. If
more complex character type representation is employed, the efficiency
of language operations on characters (e.g. string operations)
could be affected.
\item To assure forward compatibility of the proposed model
and definition with existing Common LISP implementations.
Developers should not be required to re-write large amounts of either
LISP code or data representations in order to apply the proposed
changes to existing implementations.
The proposed changes should provide an easy
portability path for existing code to many possible implementations.
\end{itemize}
%----------------------------------------------------------------------
%----------------------------------------------------------------------
%----------------------------------------------------------------------
%----------------------------------------------------------------------
\chapter{Overview}
We use several terms within this document which
may differ somewhat from
conventional usage. Definitions for the following prominent
terms are provided for the reader's convenience.
A {\em character repertoire} defines a collection of characters
independent of their specific rendered image or font. Character
repertoires are specified independent of coding and their characters
are only identified with a unique label, a graphic symbol, and
a character description.
Once defined, a character repertoire must be
{\em encoded} to allow a one-to-one mapping between a character
and a number that serves as the character code. Once a repertoire
is encoded it is called a {\em coded character set}.
In Common LISP a {\em character} data object is identified by its
{\em character code}, a unique numerical code identification.
Each character code is composed from
a {\em character set identifier},
shared by all characters of a particular character
set, and a {\em character set index}, a numerical identification which
is unique within a particular character set.
Character data objects which are classified as {\em graphic},
or displayable, are each associated with a {\em glyph}. The
glyph is the visual representation of the character.
%----------------------------------------------------------------------
\section{Character Identity}
Characters are uniquely distinguished by their codes,
which are drawn from the set of
non-negative integers.
It is important to separate the notion of glyph from the notion of
character data object when defining a scheme under which issues of
identity can be rigorously decided by a computer language. Glyphs are
the visual aspects of characters, writable on surfaces, and sometimes
called 'graphics'. A language specification valid for more than a
narrow range of systems can only make assumptions about the existence
of {\em abstract} glyphs (for example, the Latin letter A) and not about
glyph variants (for example, the italicized Latin letter {\em A})
\footnote{these later are often referred to as {\em designer} glyphs}
or characteristics of display devices. Thus, a key element of this
proposal is the removal of the {\em font} and {\em bits}
attributes from the language specification.\footnote{These and other
attributes may still be supported by an implementation but they
are extensions which do not affect the identity of the character
object.}
Character codes are composed from a character set identifier and a
character set index.
Within a given character set, individual member
characters are distinguished by character set index.
\footnote{
We specifically do not propose any standard encoding for
any character repertoires.
}
An implementation need
not support more than one character set, the {\em base} character set.
If it does support multiple
character sets, it must define the sets supported and
their characteristics. Character set identifiers are assigned to
character sets by the implementation.
\footnote{
We also do not propose any standard character set
identifiers but names such as {\clkwd :ISO8859-1988} come to mind.
}
Characters within the base character set are referred to as
{\em base characters}. Characters not in the base character set
are referred to as {\em extended characters}.
One ramification is that the distinction between {\clkwd string-char}
and {\clkwd character} is eliminated. {\bf All} characters can be
inserted into (type compatible) strings.
For compatibility, {\clkwd string-char}
is defined as equivalent to {\clkwd character}. All functions
dealing with the {\em bits} and {\em font} attributes are either
removed or modified by this proposal.
A second ramification is that character codes now have two components,
and various character predicates must be modified to deal with them.
The convention by which the character set index
and character set identifier are composed into a single integer code
is implementation dependent.
A third ramification
is that the {\clkwd characterp} predicate is extended to
support testing
membership of a character in a given character repertoire
or subrepertoire.
\footnote{
For example,
testing membership in the Kanji subrepertoire.
}
The
intent of the provision for multiple character sets
is that
native
language glyph sets (with associated digits and
punctuation)
\footnote{For example, the glyphs on the keycaps of a particular
terminal, or any other glyph sets with a common use in graphics or
symbolic communication.
}
supported by user display
hardware should each be mapped by the I/O interface
into its own character set inside
LISP, all the members of which
share a common character set identifier.
\footnote{Of course, an implementation would be free to decide if and
how supported glyphs should be differentiated into sets.
}
Which glyph sets are supported by the overall computing system, the
details of the mapping of
glyphs to character set indices, and the particular character set
identifiers used, are left unspecified by Common LISP.
The diversity of glyph sets and character
encoding conventions in use worldwide and the desirability
of allowing LISP to manipulate symbolic elements from many
languages, perhaps simultaneously, mandate such a flexible approach.
%----------------------------------------------------------------------
\section{Hierarchy of Types}
A Common LISP
implementation is required to support at least one character
repertoire: the {\em base character repertoire}.
The base character repertoire
is distinguished from every other supported character repertoire in
several respects:
\begin{itemize}
\item
The standard characters are a subrepertoire of the base characters.
\item
Only members of the base character repertoire
can be elements of a base string.
\item
The base characters are, in general, the default characters for I/O
operations.
\end{itemize}
No upper bound is specified for the number of glyphs in the base
character repertoire--that
is implementation dependent. The lower bound is 96, the
number of standard characters defined for Common LISP.
We use the term {\em extended} to describe character repertoires beyond
the base repertoire.
The following type specifier is added as a subtype
of {\clkwd character}.
\begin{itemize}
\item base-character
\end{itemize}
The distinction of a base character set is largely a pragmatic
choice. It permits efficient handling of common situations, is
in some sense privileged for host system I/O, and can serve as an
intermediate basis for portability, less general than the standard
characters, but possibly more useful across a narrower range of
implementations.
Most computers have some "natural" character representation which
is a function of hardware instructions for dealing with characters,
as well as the organization of the file system. The natural character
representation is likely to be the smallest transaction unit permitted
for text file and terminal I/O operations. On a system with a record
based I/O paradigm, the natural character representation is likely to
be the smallest record quantum. On many computer systems,
this representation is a byte.
However, there are often multiple character sets supportable on a
computer, through the use of special display and entry hardware, which
are varying interpretations of the basic system character
representation. For example, EBCDIC and extended ASCII are two
different interpretations of the same 1-byte code representations.
Many countries have their own glyph-to-code mappings for 1-byte
character codes addressing the special requirements of national
languages. Differentiating between these sets, without reference to
display hardware, is a matter of convention, since they all use the
same set of code representations. When a single byte is not enough,
two or more bytes are sometimes used for character encoding. This
makes character handling even more difficult on machines where the
natural representation size is a byte, since not only is the semantic
value of a character code a matter of convention, which may vary
within the same computing system, but so is the identification of a
set of bits as a complete character code.
It is the intention of this proposal that the base character set of
Common LISP
be the natural characters of the host system: its composition
should be
determined by the code capacity of the natural file system and I/O
transaction representations, and its assumed display glyphs should be
those of the terminals most commonly employed.
There are several advantages to this scheme. Internal representation
of strings of just base characters can be more compact than
strings including extended characters.
Source programs are likely to consist predominantly of base characters
since the standard characters are a subset of the base character
repertoire. Parsing of pure base character text
can be more efficient than parsing of text including
extended characters.
I/O can be performed more simply
with base characters,
and they can be used as a basis for data representations to
be shared with other LISP sessions with potentially different
character set definitions or non-LISP processes.
{\em Implementation note}:
Although the readtable must be capable of
holding syntax information for all characters, the data
structure(s) used internally for the readtable may be segmented
into a section for each defined character set. Access for
base character syntax during the parsing of base strings may
be quicker than the general case since the table section is the
same for all component characters, and entries may be accessed
with a single index by code point.
The standard characters are the 96 characters used in the Common LISP
definition {\bf or their equivalents}.
This was the Common LISP \cite{steele84} definition, but
{\em equivalents} is a vague term.
The standard characters are not defined by their glyphs, but by their
roles within the language. There are two aspects to the roles of the
standard characters: one is their role in reader and format control
string syntax; the second is their role as components of the names of
all Common LISP
functions, macros, constants, and global variables. As
long as an implementation chooses 96 characters
and treats those 96 in a manner consistent with
the language's specification for the standard characters (e.g.
the naming of functions), it doesn't matter what glyphs the I/O
hardware uses to represent those characters: they are the standard
characters. Any program or
data text written wholly in those characters
is portable through simple code conversion.
A mechanism, such as in \cite{linden87}, which supports establishment of
equivalency between distinct characters is not excluded by
of this proposal.
\footnote{But, as with the font character attribute,
is not a mechanism standardized by the Common LISP definition.}
In general, the authors of this proposal favor the long
term solution of ISO standardization of non-overlapping
character repertoires.
The {\clkwd string} type
is defined as
a vector of characters. More precisely, a string
is a specialized vector whose elements are of type
{\clkwd character} or a subtype of character. There are three strings
distinguished with standardized names: {\em base-string},
{\em most-general-string}, and {\em simple-base-string}.
A base string can only contain base characters. A
{\clkwd most-general-string}
can contain any implementation supported base or extended characters,
in any mixture.
All Common LISP functions defined to operate on strings operate
consistently on base strings and extended strings with the following
caveat: for any function which inserts a character into a string, it
is an error to insert an extended character
into a base string.
The {\clkwd coerce} function is extended to
allow for explicit coercion between base strings and extended strings.
During reader
construction of symbols, if all the characters
in the symbol's name are of type {\clkwd base-character},
then the name of the symbol will be stored as a base string.
Otherwise it will be stored as an extended string.
The base string type allows for more compact representation of strings
of base characters, which are likely to predominate in any system.
Note that in any particular implementation the base character set
need not be the
most compactly representable character set, since another might have
fewer code points. However, in most implementations base strings are
likely to be more space efficient than extended strings.
It has been suggested that either a single string type is
sufficient for large character set Common LISP implementations,
or that a hierarchy of string types could be used, in a manner
transparent to the user. A desire to flexibly support many different
character sets without compromising the efficiency of ordinary
applications led us to accept the need for more than one string type.
We believe that these choices reflect a minimal
modification of this aspect of the type system, and that
exposing the string types for user programs to negotiate in their own
way is the most reasonable approach.
%----------------------------------------------------------------------
\section{Streams and System I/O}
A lot of the work of ensuring that a
Common LISP implementation operates
correctly in a multiple character set environment must be performed by
the I/O interface.
The system I/O interface, abstracted in
Common LISP as streams, is responsible
for ensuring that text input from outside LISP is properly mapped
into character sets internally, and that the inverse mapping
\footnote{Such an inverse may not exist.
An implementation might legally fold multiple
external character sets into a single internal set on input
(e.g. EBCDIC and ASCII).
}
is performed on output. It is beyond the scope of a language
definition to specify the details of this operation, but options
are specified which allow runtime indication from the user as to
what character sets a stream uses, and how the mappings
should be done. It is expected that implementations will provide
reasonable defaults and invocation options to accommodate desired use
at an installation.
In addition to supporting conversion at the system interface, the
language must allow user programs to determine how much space data
objects will require when output in whichever external representations
are available.
Two keyword arguments are proposed as additions to {\clkwd open}:
\begin{itemize}
\item {\clkwd :character-set}
whose value would be:
\begin{itemize}
\item A name or list of names of
defined character sets in the form of keywords.
The default is the base character set when
{\clkwd :external-code-format} is also defaulted. If a non-default
value is specified for {\clkwd :external-code-format}, there may be a
different default for {\clkwd :character-set}.
\end{itemize}
\item {\clkwd :external-code-format}
whose value would be:
\begin{itemize}
\item
A keyword indicating an implementation recognized scheme for
representing 1 or more character sets with non-homogeneous codes.
The default is the natural system character representation,
the base character representation.
As many {\clkwd :character-set} names must be provided as the
implementation requires for that external coding convention.
\footnote{
For example, the SO/SI SBCS/DBCS convention used by IBM on 370
machines could be selected by a keyword like
{\clkwd :shift-delimited}.
The compact run-encoding convention defined by XEROX could be
selected by {\clkwd :run-encoded}.
The SBCS/DBCS convention based on
ASCII which uses leading bit patterns to distinguish two-byte codes
from one-byte codes could be selected by a keyword like
{\clkwd :high-byte-delimited}.
For example, if {\clkwd :shift-delimited} were the
{\clkwd :external-code-format} argument, two character set specifiers
would have to be provided.
}
\end{itemize}
\end{itemize}
These arguments are provided for input, output, and I/O
(bidirectional) streams. All characters read from the streams will be
members of the character sets specified by the {\clkwd :character-set}
argument. It is an error to try to write a character other than a
member of
the specified sets to a stream. (This includes the
\#$\backslash${\clkwd $N$ ewline} character.
Implementations should provide for appropriate line division behavior
through the function {\clkwd terpri}.)
The new function {\clkwd external-width} takes a character object
or string as its required argument. It also takes an optional
{\em output-stream}.
It returns the number of host system character
representation quantum units
\footnote{
Same as the storage width of a base character, usually a byte.
}
required to externally store that object, using the indicated
representation convention. If the item cannot be represented in
that convention, the function returns {\clkwd nil}.
This function is necessary
to determine if internal strings can be written to fixed length
fields in databases or terminal screen templates. Note that this
function addresses the problem of storage width, and does not
address the problem of display width, which may involve calculating
screen width of strings printed in proportional fonts.
An implementation supporting multiple character sets
must allow for the external and
internal representation of characters to be separately (and perhaps
multiply) specified to {\clkwd open},
since there can be circumstances under
which more than one external representation for an internal character
set is in use, or more than one character set is mixed together in an
external representation convention.
%----------------------------------------------------------------------
%----------------------------------------------------------------------
∂27-Jun-88 0848 CL-Characters-mailer part2
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 27 Jun 88 08:45:32 PDT
Date: Mon, 27 Jun 88 08:39:36 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880627.083936.baggins@IBM.com>
Subject: part2
Part 2 of the proposal is appended to this note.
Regards,
Thom
-----------------------------------------------------------------------
%----------------------------------------------------------------------
%----------------------------------------------------------------------
\appendix
\chapter{Editorial Modifications to CLtL}
The following sections specify the editorial changes needed in
CLtL to support the proposal. Section/subsection numbers and titles
match those found in \cite{steele84}. The notation
{\bf $\Rightarrow$\ddag x} denotes a reference to paragraph x within the
subsection (we count each individual example or metastatement
as 1 paragraph of text).
%----------------------------------------------------------------------
\setcounter{section}{1}
\section{Data Types} % 2
%----------------------------------------------------------------------
\editstart 8 replace
\+
\\ \sf
rich character set, including ways to represent characters of various
type styles.
\-
\\ \bf with
\+
\\ \cltxt
rich character repertoire.
\-
\editend
\setcounter{subsection}{1}
\subsection{Characters} % 2.2.
\editstart 1 replace
\+
\\ \cltxt
Characters are represented as data objects of type {\clkwd character}.
\\
There are two subtypes of interest, called
{\clkwd standard-char} and {\clkwd string-char}.
\-
\\ \bf with
\+
\\ \cltxt
Characters are represented as data objects of type
{\clkwd character}.
\-
\editend
\editstart 2 replace
\+
\\ \cltxt
This works well enough for printing characters. Non-printing
characters
\-
\\ \bf with
\+
\\ \cltxt
This works well enough for graphic characters. Non-graphic
characters
\-
\editend
\subsubsection{Standard Characters} % 2.2.1.
\editstart 0 replace section heading
\+
\\ \cltxt
Standard Characters
\-
\\ \bf with
\+
\\ \cltxt
Base Characters
\-
\editend
\editstart 1 insert before
\+
\\ \cltxt
Most computers have some "base" character representation which
is a function
\\
of hardware instructions for dealing with characters, as well as
the organization of
\\
the file system. This base character representation is likely
to be the smallest
\\
transaction unit permitted for text stream I/O operations.
\\
The base character representation (often a byte) supports an
implementation specific
\\
{\em coded base character set} such as the ASCII and the EBCDIC
coded character sets.
\\
The {\em base character repertoire} is defined as
the collection of characters
\\
contained in the coded base character set. Common LISP does
not define the base
\\
character encoding
but does require all implementations to support a "standard"
\\
{\em subrepertoire} of the base character
repertoire.
\-
\editend
\editstart 1 insert before
\+
\\ \cltxt
The {\clkwd base-character} type is defined as a subtype of
{\clkwd character}. A {\clkwd base-character} object can
\\
contain any member of the base character repertoire. Objects of
type
\\
{\\clkwd (and character (not base-character))} are referred to
as {\em extended characters}.
\-
\editend
\editstart 1 replace
\+
\\ \cltxt
Common LISP defines a "standard character set" (subtype
{\clkwd standard-char}) for two
\\
purposes. Common LISP programs that are written in the
standard character set
\\
can be read by any Common LISP implementation; and Common LISP
programs
\\
that use only standard characters as data objects are most likely
to be portable. The
\\
Common LISP character set consists of a space character
\#$\backslash${\clkwd Space}, a newline
\\
\#$\backslash${\clkwd Newline}, and the following ninety-four
non-blank printing characters or their equivalents:
\-
\\ \bf with
\+
\\ \cltxt
As a subset of the base character repertoire,
Common LISP defines a standard character
\\
subrepertoire for two purposes.
\\
Common LISP programs that are written in the
standard character subrepertoire
\\
can be read by any Common LISP implementation; and Common LISP
programs
\\
that use only standard characters as data objects are most likely
to be portable.
\\
The standard characters are not defined by their glyphs, but by their
roles within
\\
the language. There are two aspects to the roles of the
standard characters:
\\
one is their role in reader and format control
string syntax; the second is their role as
\\
components of the names of all Common LISP
functions, macros, constants, and global variables. As
\\
long as an implementation chooses 96 glyphs
and treats those 96 in a manner consistent with
\\
the language's specification for the standard characters
(for example,
the naming of functions),
\\
it doesn't matter what glyphs the I/O
hardware uses to represent those characters: they are
\\
the standard characters. Any program or
data text written wholly in those characters
\\
is portable through simple code conversion.
The Common LISP standard character subrepertoire
\\
consists of a space character \#$\backslash${\clkwd Space}, a newline
\#$\backslash${\clkwd Newline}, and the
\\
the following nienty-four graphic characters or their equivalents:
\-
\editend
\editstart 1 insert the following table:
{\bf Common LISP Standard Character Subrepertoire}
\footnote{\#$\backslash${\clkwd Space}
and \#$\backslash${\clkwd Newline} are omitted.
Graphic identifiers and descriptions are from ISO 6937/2.}
\editend
{\small \begin{tabular}{||l|c|l||l|c|l||} \hline
ID & Glyph & Name or description
& ID & Glyph & Name or description
\\ \hline
LA01 & a & small a
& ND01 & 1 & digit 1
\\ \hline
LA02 & A & capital A
& ND02 & 2 & digit 2
\\ \hline
LB01 & b & small b
& ND03 & 3 & digit 3
\\ \hline
LB02 & B & capital B
& ND04 & 4 & digit 4
\\ \hline
LC01 & c & small c
& ND05 & 5 & digit 5
\\ \hline
LC02 & C & capital C
& ND06 & 6 & digit 6
\\ \hline
LD01 & d & small d
& ND07 & 7 & digit 7
\\ \hline
LD02 & d & capital D
& ND08 & 8 & digit 8
\\ \hline
LE01 & e & small e
& ND09 & 9 & digit 9
\\ \hline
LE02 & E & capital E
& ND00 & 0 & digit 0
\\ \hline
LF01 & f & small f
& SC03 & \$ & dollar sign
\\ \hline
LF02 & F & capital F
& SP02 & ! & exclamation mark
\\ \hline
LG01 & g & small g
& SP04 & " & quotation mark
\\ \hline
LG02 & G & capital G
& SP05 & \apostrophe & apostrophe
\\ \hline
LH01 & h & small h
& SP06 & ( & left parenthesis
\\ \hline
LH02 & H & capital H
& SP07 & ) & right parenthesis
\\ \hline
LI01 & i & small i
& SP08 & , & comma
\\ \hline
LI02 & I & capital I
& SP09 & \_ & low line
\\ \hline
LJ01 & k & small j
& SP10 & - & hyphen or minus sign
\\ \hline
LJ02 & K & capital J
& SP11 & . & full stop, period
\\ \hline
LK01 & k & small k
& SP12 & / & solidus
\\ \hline
LK02 & K & capital K
& SP13 & : & colon
\\ \hline
LL01 & l & small l
& SP14 & ; & semicolon
\\ \hline
LL02 & L & capital L
& SP15 & ? & question mark
\\ \hline
LM01 & m & small m
& SA01 & + & plus sign
\\ \hline
LM02 & M & capital M
& SA03 & $<$ & less-than sign
\\ \hline
LN01 & n & small n
& SA04 & = & equals sign
\\ \hline
LN02 & N & capital N
& SA05 & $>$ & greater-than sign
\\ \hline
LO01 & o & small o
& SM01 & \# & number sign
\\ \hline
LO02 & O & capital O
& SM02 & \% & percent sign
\\ \hline
LP01 & p & small p
& SM03 & \& & ampersand
\\ \hline
LP02 & P & capital P
& SM04 & * & asterisk
\\ \hline
LQ01 & q & small q
& SM05 & @ & commercial at
\\ \hline
LQ02 & Q & capital Q
& SM06 & [ & left square bracket
\\ \hline
LR01 & r & small r
& SM07 & $\backslash$ & reverse solidus
\\ \hline
LR02 & R & capital R
& SM08 & ] & right square bracket
\\ \hline
LS01 & s & small s
& SM11 & \} & left curly bracket
\\ \hline
LS02 & S & capital S
& SM13 & $|$ & vertical bar
\\ \hline
LT01 & t & small t
& SM14 & \} & right curly bracket
\\ \hline
LT02 & T & capital T
& SD13 & \bq & grave accent
\\ \hline
LU01 & u & small u
& SD15 & $\hat{ }$ & circumflex accent
\\ \hline
LU02 & U & capital U
& SD19 & $\tilde{ }$ & tilde
\\ \hline
LV01 & v & small v
& & &
\\ \hline
LV22 & V & capital V
& & &
\\ \hline
LW01 & w & small w
& & &
\\ \hline
LW02 & W & capital W
& & &
\\ \hline
LX01 & x & small x
& & &
\\ \hline
LX22 & X & capital X
& & &
\\ \hline
LY01 & y & small y
& & &
\\ \hline
LY02 & Y & capital Y
& & &
\\ \hline
LZ01 & z & small z
& & &
\\ \hline
LZ02 & Z & capital Z
& & &
\\
\hline
\end{tabular} }
\editstart 2 delete
\editend
\editstart 3 delete
\editend
\editstart 4 delete
\editend
\editstart 5 delete
\editend
\editstart 6 replace
\+
\\ \cltxt
Of the ninety-four non-blank printing characters
\-
\\ \bf with
\+
\\ \cltxt
Of the ninety-four graphic characters
\-
\editend
\editstart 10 delete
\editend
\editstart 11 delete
\editend
\subsubsection{Line Divisions} % 2.2.2.
\subsubsection{Non-standard Characters} % 2.2.3.
\editstart delete entire section
\editend
\subsubsection{Character Attributes} % 2.2.4.
\editstart 1 delete
\editend
\editstart 1 new
\+
\\ \cltxt
Every object of type {\clkwd character} has three attributes:
\\
{\sf code, character-set}, and {\sf character-set-index}.
\\
Character identity is uniquely distinguished by either the code
\\
attribute or the combined character-set and character-set-index
attributes.
\\
\\
{\bf Note: Bob Kerns is reworking the following paragraph}
\\
\\
If an implementation has additional attributes of characters,
\\
dealing with how the character is displayed or its typography,
\\
these attributes are not part of the code, character-set or
\\
character-set-index attributes. For example, bold-face, color
\\
or size are not considered part of the identity of a character
\\
and are not included. Case, however, is part of the character
identity.
\\
In symbol construction, implementation defined attributes such as
\\
color are removed.
\\
It is implementation dependent whether characters within
\\
double quotes have any implementation defined attributes removed.
\\
If two characters have identical implementation defined attributes,
then their ordering by
\\
{\clkwd char}$<$ is consistent with the numerical ordering by the
predicate $<$ on their code
\\
attributes.
\-
\editend
\editstart 2 delete
\editend
\editstart 3 delete
\editend
\editstart 4 delete
\editend
\editstart 5 delete
\editend
\subsubsection{String Characters} % 2.2.5.
\editstart delete this section
\editend
\subsection{Symbols} % 2.3.
\editstart 12 replace
\+
\\ \cltxt
A symbol may have uppercase letters, lowercase letters, or both
in its print name.
\-
\\ \bf with
\+
\\ \cltxt
A symbol may have characters from any supported character repertoire
\\
in its print name.
\\
It may have uppercase letters, lowercase letters, or both.
\-
\editend
\setcounter{subsection}{4}
\subsection{Arrays}
\subsubsection{Vectors}
\editstart 6 replace
\+
\\ \cltxt
All implementations provide specialized arrays for the cases when
\\
the components are characters (or rather, a special subset of the
characters);
\-
\\ \bf with
\+
\\ \cltxt
All implementations provide specialized arrays for the cases when
\\
the components are characters (or optionally, special subsets of
the characters);
\-
\editend
\subsubsection{Strings}
\editstart 1 replace
\+
\\ \cltxt
A string is simply a vector of characters. More precisely, a string
\\
is a specialized vector whose elements are of type
{\clkwd string-char}.
\-
\\ \bf with
\+
\\ \cltxt
A string is simply a vector of characters. More precisely, a string
\\
is a specialized vector whose elements are of type
{\clkwd character} or a subtype of character.
\-
\editend
\setcounter{subsection}{14}
\subsection{Overlap, Inclusion, and Disjointness of Types} % 2.15.
\editstart 14 replace
\+
\\ \cltxt
The type {\clkwd standard-char} is a subtype of {\clkwd string-char};
\\
{\clkwd string-char} is a subtype of {\clkwd character}.
\-
\\ \bf with
\+
\\ \cltxt
{\bf Compatibility note: -------------}
\\
The type {\clkwd standard-char} is a subtype of {\clkwd character};
\\
The type {\clkwd string-char} means {\clkwd character}. Both
\\
are retained for compatibility with earlier versions of Common LISP.
\\
{\bf --------------------------------------------}
\-
\editend
\editstart 15 replace
\+
\\ \cltxt
The type {\clkwd string} is a subtype of {\clkwd vector},
\\
for {\clkwd string} means {\clkwd (vector string-char)}.
\-
\\ \bf with
\+
\\ \cltxt
The type {\clkwd string} is a subtype of {\clkwd vector},
\\
{\clkwd string} consists of vectors specialized by subtypes of
{\clkwd character}.
\-
\editend
\editstart 15 insert after
\+
\\ \cltxt
The type {\clkwd most-general-string} is equivalent to
\\
{\clkwd (vector character)} and is a subtype of {\clkwd string}.
\-
\editend
\editstart 15 insert new paragraph
\+
\\ \cltxt
The type {\clkwd base-string} is equivalent to
\\
{\clkwd \apostrophe (vector base-character)}.
\-
\editend
\editstart 20 replace
\+
\\ \cltxt
{\clkwd (simple-array string-char (*))};
\-
\\ \bf with
\+
\\ \cltxt
{\clkwd (simple-array character (*))};
\-
\editend
\editstart 20 insert after
\+
\\ \cltxt
The type {\clkwd simple-base-string} is equivalent to
\\
{\clkwd (simple-array base-character (*))} and
\\
is the most efficient string which can hold
the standard character repertoire.
\-
\editend
%----------------------------------------------------------------------
\setcounter{section}{3}
\section{Type Specifiers} % 4
%----------------------------------------------------------------------
\setcounter{subsection}{1}
\subsection{Type Specifier Lists} % 4.2.
\editstart 8 remove from table 4-1 (alphabetic list)
\+
\\ \cltxt
{\clkwd standard-char}
\\
{\clkwd string-char}
\-
\editend
\editstart 8 insert into table 4-1 (alphabetic list)
\+
\\ \cltxt
{\clkwd base-character}
\\
{\clkwd most-general-string}
\\
{\clkwd simple-base-string}
\-
\editend
\setcounter{subsection}{2}
\subsection{Predicating Type Specifiers} % 4.3.
\editstart 2 delete
\editend
\editstart 3 delete the example
\editend
\setcounter{subsection}{5}
\subsection{Type Specifiers That Abbreviate} % 4.6.
\editstart 20 replace
\+
\\ \cltxt
Means the same as {\clkwd (array string-char ({\em size}))}: the set of
strings of the indicated size.
\\
\-
\\ \bf with
\+
\\ \cltxt
Means the union of the vector types specialized by subtypes of
character and the indicated size.
\-
\editend
\editstart 23 replace
\+
\\ \cltxt
Means the same as {\clkwd (simple-array string-char ({\em size}))}: the
\\
set of simple strings of the indicated size.
\\
\-
\\ \bf with
\+
\\ \cltxt
Means the same as {\clkwd (simple-array character ({\em size}))}: the
\\
set of simple strings of the indicated size.
\-
\editend
\editstart 23 insert after
\+
\\ \cltxt
{\clkwd (base-string {\em size})}
\\
Means the same as {\clkwd (array base-character ({\em size}))}: the
\\
set of base strings of the indicated size.
\\
\-
\editend
\editstart 23 insert after
\+
\\ \cltxt
{\clkwd (simple-base-string {\em size})}
\\
Means the same as {\clkwd (simple-array base-character ({\em size}))}:
\\
the set of simple base strings of the indicated size.
\\
\-
\editend
%----------------------------------------------------------------------
\setcounter{section}{5}
\section{Predicates} % 6
%----------------------------------------------------------------------
\editstart 2 replace
\+
\\ \cltxt
but {\clkwd standard-char} begets {\clkwd standard-char-p}
\-
\\ \bf with
\+
\\ \cltxt
but {\clkwd bit-vector} begets {\clkwd bit-vector-p}
\-
\editend
\setcounter{subsection}{1}
\subsection{Data Type Predicates} % 6.2.
\setcounter{subsubsection}{1}
\subsubsection{Specific Data Type Predicates} % 6.2.2.
\editstart 36 replace
\+
\\ \cltxt
{\clkwd characterp} {\em object}
\-
\\ \bf with
\+
\\ \cltxt
{\clkwd characterp} {\em object} \&{\clkwd optional}
({\em repertoire})
\-
\editend
\editstart 37 replace
\+
\\ \cltxt
{\clkwd characterp} is true if its argument is a character, and
otherwise is false.
\\
\-
\\ \bf with
\+
\\ \cltxt
If {\em repertoire} is omitted, {\clkwd characterp}
is true if its argument is a character object, and otherwise is false.
\\
\\
If a {\em repertoire} keyword argument is specified,
{\clkwd characterp} is true if its argument is a
\\
character object and a member of the specified repertoire
or subrepertoire, and otherwise is false.
\\
For example, {\clkwd (characterp \#$\backslash$A}
{\clkwd :standard)}
\\
is true since \#$\backslash$A is a member of the standard character
subrepertoire.
\-
\editend
\editstart 38 replace
\+
\\ \cltxt
{\clkwd (characterp x) $\equiv$ (typep x \apostrophe character)}
\\
\-
\\ \bf with
\+
\\ \cltxt
{\clkwd (characterp x :standard) $\equiv$ (typep x \apostrophe
(character :standard)}
\-
\editend
\editstart 72 replace
\+
\\ \cltxt
See also {\clkwd standard-char-p, string-char-p, streamp,}
\\
\-
\\ \bf with
\+
\\ \cltxt
See also {\clkwd standard-char-p, streamp,}
\-
\editend
\setcounter{subsubsection}{2}
\subsubsection{Equality Predicates} % 6.2.3.
\editstart 75 replace
\+
\\ \cltxt
which ignores alphabetic case and certain other attributes
of characters;
\\
\-
\\ \bf with
\+
\\ \cltxt
which ignores alphabetic case
of characters;
\-
\editend
%----------------------------------------------------------------------
\setcounter{section}{6}
\section{Control Structure} % 7
%----------------------------------------------------------------------
\setcounter{subsection}{1}
\subsection{Generalized Variables} % 7.2.
\editstart 19 modify table
\+
\\ \cltxt
char string-char
\\
schar string-char
\-
\\ \bf with
\+
\\ \cltxt
char character
\\
schar character
\-
\editend
\editstart 22 delete table entry
\+
\\ \cltxt
char-bit first set-char-bit
\-
\editend
%----------------------------------------------------------------------
\setcounter{section}{9}
\section{Symbols} % 10
%----------------------------------------------------------------------
\editstart 3 replace
\+
\\ \cltxt
It is ordinarily not permitted to alter a symbol's print name.
\-
\\ \bf with
\+
\\ \cltxt
It is an error to alter a symbol's print name.
\-
\editend
\setcounter{subsection}{1}
\subsection{The Print Name} % 10.2.
\editstart 5 replace
\+
\\ \cltxt
It is an extremely bad idea
\-
\\ \bf with
\+
\\ \cltxt
It is an error and an extremely bad idea
\-
\editend
%----------------------------------------------------------------------
\setcounter{section}{12}
\section{Characters} % 13
%----------------------------------------------------------------------
\setcounter{subsection}{0}
\subsection{Character Attributes} % 13.1.
\editstart 1 replace
\+
\\ \cltxt
Every character has three attributes: code, bits, and font. The
code attribute is
\\
intended to distinguish among the printed glyphs and formatting
functions for
\\
characters. The bits attribute allows extra flags to be associated
with a character.
\\
The font attribute permits a specification of the style of the glyphs
(such as italics).
\-
\\ \bf with
\+
\\ \cltxt
Every character has three attributes: code, character-set, and
character-set-index.
\\
The code attribute is intended to distinguish among glyphs and
formatting functions for
\\
characters. The character-set and character-set-index attributes
identify the character's
\\
membership within a specific character set. Combined, character-set
and character-set-index
\\
encode the same information as the code attribute.
\-
\editend
\editstart 3 append
\+
\\ \cltxt
There may be unassigned codes between {\clkwd char-code-limit} which
\\
are not legal arguments to {\clkwd code-char}.
\-
\editend
\editstart 4 delete
\editend
\editstart 5 delete
\editend
\editstart 6 delete
\editend
\editstart 7 delete
\editend
\editstart 8 delete
\editend
\editstart 9 delete
\editend
\setcounter{subsection}{1}
\subsection{Predicates on Characters} % 13.2.
\editstart 3 replace
\+
\\ \cltxt
argument is a "standard character" that is, an object of type
{\clkwd standard-char}.
\\
Note that any character with a non-zero {/em bits} or {/em font}
attribute is non-standard.
\-
\\ \bf with
\+
\\ \cltxt
argument is one of the Common LISP standard character subrepertoire.
\-
\editend
\editstart 5 delete
\+
\\ \cltxt
The semi-standard characters \#$\backslash${\clkwd Backspace},
\#$\backslash${\clkwd Tab},
\#$\backslash${\clkwd Rubout},
\#$\backslash${\clkwd Linefeed},
\#$\backslash${\clkwd Return},
and \#$\backslash${\clkwd Page} are note graphic.
\-
\editend
\editstart 6 delete
\editend
\editstart 7 delete
\editend
\editstart 8 delete
\editend
\editstart 9 delete
\editend
\editstart 12 replace
\+
\\ \cltxt
If a character is alphabetic, then it is perforce graphic. Therefore
any character
\\
with a non-zero bits attribute cannot be alphabetic. Whether a
character is alphabetic
\\
may depend on its font number.
\-
\\ \bf with
\+
\\ \cltxt
If a character is alphabetic, then it is perforce graphic.
\-
\editend
\editstart 21 replace
\+
\\ \cltxt
If a character is either uppercase or lowercase, it is necessarily
alphabetic (and
\\
therefore is graphic, and therefore has a zero bits attribute).
\\
However, it is permissible in theory for an alphabetic character
to be neither uppercase
\\
nor lowercase (in a non-Roman font, for example).
\-
\\ \bf with
\+
\\ \cltxt
If a character is either uppercase or lowercase, it is necessarily
alphabetic (and
\\
therefore is graphic).
\-
\editend
\editstart 24 replace
\+
\\ \cltxt
The argument {\em char} must be a character object, and {\em radix}
must be a non-negative
\\
integer. If {\em char} is not a digit of the radix specified
\-
\\ \bf with
\+
\\ \cltxt
The argument {\em char} must be in the standard character
subrepertoire and
\\
{\em radix} must be a non-negative integer.
\\
If {\em char} is not a standard character or is not a digit of the
radix specified
\-
\editend
\editstart 46 delete
\editend
\editstart 47 replace
\+
\\ \cltxt
If two characters differ in any attribute (code, bits, or font), then
they
\-
\\ \bf with
\+
\\ \cltxt
If two characters differ in any attribute, then
they
\-
\editend
\editstart 89 replace
\+
\\ \cltxt
The predicate {\clkwd char-equal} is like {\clkwd char=}, and
similarly for the others, except
\\
according to a different ordering such that differences of bits
\\
attributes and case are ignored, and font information is taken into
\\
account in an implementation dependent manner.
\-
\\ \bf with
\+
\\ \cltxt
The predicate {\clkwd char-equal} is like {\clkwd char=}, and
similarly for the others, except
\\
according to a different ordering such that differences of case and
\\
implementation defined attributes are ignored.
\-
\editend
\editstart 93 delete
\editend
\setcounter{subsection}{2}
\subsection{Character Construction and Selection} % 13.3.
\editstart 3 replace
\+
\\ \cltxt
this will be a non-negative integer less than the (normal) value
\-
\\ \bf with
\+
\\ \cltxt
this will be a non-negative integer less than the value
\-
\editend
\editstart 4 delete
\editend
\editstart 5 delete
\editend
\editstart 6 delete
\editend
\editstart 7 delete
\editend
\editstart 8 replace
\+
\\ \cltxt
{\clkwd code-char {\em code} \&optional {\em (bits 0) (font 0)}
[{\em Function}]}
\-
\\ \bf with
\+
\\ \cltxt
{\clkwd code-char {\em code}
[{\em Function}]}
\-
\editend
\editstart 9 replace
\+
\\ \cltxt
All three arguments must be non-negative integers. If it is possible
in the
\\
implementation to construct a character object whose code attribute
is {\em code}, whose
\\
bits attribute is {\em bits}, and whose font attribute is {\em font},
then such an object is returned;
\-
\\ \bf with
\+
\\ \cltxt
The argument must be a non-negative integer. If it is possible
in the
\\
implementation to construct a character object whose code attribute
is {\em code},
\\
then such an object is returned;
\-
\editend
\editstart 10 replace
\+
\\ \cltxt
For any integers, {\em c, b,} and {\em f}, if {\clkwd (code-char
{\em c b f})} is
\-
\\ \bf with
\+
\\ \cltxt
For any integer, {\em c}, if {\clkwd (code-char
{\em c})} is
\-
\editend
\editstart 12 delete
\editend
\editstart 13 delete
\editend
\editstart 14 replace
\+
\\ \cltxt
If the font and bits attributes of a character object {\clkwd c}
are zero, then it is the case that
\-
\\ \bf with
\+
\\ \cltxt
If the implementation defined
attributes of a character object {\clkwd c}
do not exist, then
\-
\editend
\editstart 17 delete
\editend
\editstart 18 delete
\editend
\editstart 19 delete
\editend
\setcounter{subsection}{3}
\subsection{Character Conversions} % 13.4.
\editstart 8 replace
\+
\\ \cltxt
{\clkwd char-upcase} returns a character object with the same
font and bits attributes
\-
\\ \bf with
\+
\\ \cltxt
{\clkwd char-upcase} returns a character object with the same
implementation defined attributes
\-
\editend
\editstart 10 replace
\+
\\ \cltxt
Similarly, {\clkwd char-downcase} returns a character object with the
\\
same font and bits attributes
\-
\\ \bf with
\+
\\ \cltxt
Similarly, {\clkwd char-downcase} returns a character object with the
\\
same implementation defined attributes
\-
\editend
\editstart 12 delete
\editend
\editstart 13 replace
\+
\\ \cltxt
{\clkwd digit-char {\em weight} \&optional ({\em radix} 10)
({\em font} 0) [{\em Function}]}
\-
\\ \bf with
\+
\\ \cltxt
{\clkwd digit-char {\em weight} \&optional ({\em radix} 10)
[{\em Function}]}
\-
\editend
\editstart 14 replace
\+
\\ \cltxt
All arguments must be integers. {\clkwd digit-char} determines
whether or not it is possible
\\
to construct a character object whose font attribute is {\em font},
and whose {\em code}
\-
\\ \bf with
\+
\\ \cltxt
All arguments must be integers. {\clkwd digit-char} determines
whether or not it is possible
\\
to construct a character object whose {\em code}
\-
\editend
\editstart 15 replace
\+
\\ \cltxt
{\clkwd digit-char} cannot return {\clkwd nil} if {\em font}
is zero, {\em radix}
\-
\\ \bf with
\+
\\ \cltxt
{\clkwd digit-char} cannot return {\clkwd nil}.
{\em radix}
\-
\editend
\editstart 22 delete
\editend
\editstart 32 replace
\+
\\ \cltxt
All characters that have zero font and bits attributes and that are
non-graphic
\-
\\ \bf with
\+
\\ \cltxt
All characters that are
non-graphic
\-
\editend
\editstart 35 delete
\editend
\setcounter{subsection}{4}
\subsection{Character Control-Bit Functions} % 13.5.
\editstart delete entire section
\editend
%----------------------------------------------------------------------
\setcounter{section}{13}
\section{Sequences} % 14
%----------------------------------------------------------------------
\setcounter{subsection}{0}
\subsection{Simple Sequence Functions} % 14.1
\editstart 24 append
\+
\\ \cltxt
If type {\clkwd string} is specified, a string of type
{\clkwd extended-string} is returned.
\-
\editend
\setcounter{subsection}{1}
\subsection{Concatenating, Mapping, and Reducing Sequences} % 14.2.
\editstart 3 append
\+
\\ \cltxt
If {\em result-type} {\clkwd string} is specified, any string
\\
subtype which can hold the elements of the sequence can be returned.
\-
\editend
\editstart 6 append
\+
\\ \cltxt
If {\em result-type} {\clkwd string} is specified, any string
\\
subtype which can hold the elements of the sequence can be returned.
\-
\editend
\setcounter{subsection}{2}
\subsection{Modifying Sequences} % 14.3.
\editstart 29 append
\+
\\ \cltxt
If {\em newitem} is of type {\clkwd string}, any string subtype
\\
which can hold the elements of the result sequence can be returned.
\-
\editend
\editstart 36 append
\+
\\ \cltxt
If {\em newitem} is of type {\clkwd string}, any string subtype
\\
which can hold the elements of the result sequence can be returned.
\-
\editend
\setcounter{subsection}{4}
\subsection{Sorting and Merging} % 14.5.
\editstart 20 append
\+
\\ \cltxt
If {\em result-type} {\clkwd string} is specified, any string subtype
\\
which can hold the elements of the result sequence can be returned.
\-
\editend
%----------------------------------------------------------------------
\setcounter{section}{17}
\section{Strings} % 18
%----------------------------------------------------------------------
\editstart 1 replace
\+
\\ \cltxt
Specifically, the type {\clkwd string} is identical to the type
{\clkwd (vector string-char),}
\\
which in turn is the same as {\clkwd (array string-char (*))}.
\-
\\ \bf with
\+
\\ \cltxt
Specifically, the type {\clkwd string} is a subtype of
{\clkwd vector}
\\
and consists of vectors specialized by subtypes of {\clkwd character}.
\-
\editend
\setcounter{subsection}{0}
\subsection{String Access} % 18.1.
\editstart 3 replace
\+
\\ \cltxt
{\clkwd schar} {\em simple-string index} [{\em Function}]
\-
\\ \bf with
\+
\\ \cltxt
{\clkwd schar} {\em simple-base-string index} [{\em Function}]
\-
\editend
\editstart 4 replace
\+
\\ \cltxt
character object. (This character will necessarily satisfy the
predicate {\clkwd string-char-p}).
\-
\\ \bf with
\+
\\ \cltxt
character object.
\-
\editend
\editstart 10 replace
\+
\\ \cltxt
it must be a simple string.
\-
\\ \bf with
\+
\\ \cltxt
it must be a simple base string.
\-
\editend
\setcounter{subsection}{2}
\subsection{String Construction and Manipulation} % 18.3.
\editstart 2 replace
\+
\\ \cltxt
{\clkwd make-string {\em size} \&key :initial-element [{\em Function}]}
\-
\\ \bf with
\+
\\ \cltxt
{\clkwd make-string {\em size} \&key :initial-element :element-type
[{\em Function}]}
\-
\editend
\editstart 3 replace
\+
\\ \cltxt
This returns a string (in fact a simple string) of length {/em size},
\-
\\ \bf with
\+
\\ \cltxt
This returns a string of length {/em size},
\-
\editend
\editstart 5 replace
\+
\\ \cltxt
A string is really just a one-dimensional array of "string
characters" (that is,
\\
those characters that are members of type {\clkwd string-char}).
\\
More complex character arrays may be constructed using the function
{\clkwd make-array}.
\-
\\ \bf with
\+
\\ \cltxt
More complex character arrays may be constructed using the function
{\clkwd make-array}.
\-
\editend
\editstart 29 replace
\+
\\ \cltxt
If {\em x} is a string character (a character of type
{\clkwd string-char}), then
\-
\\ \bf with
\+
\\ \cltxt
If {\em x} is a character, then
\-
\editend
%----------------------------------------------------------------------
\setcounter{section}{21}
\section{Input/Output} % 22
\setcounter{subsection}{0}
\subsection{Printed Representation of LISP Objects} % 22.1.
\setcounter{subsubsection}{0}
\subsubsection{What the Read Function Accepts} % 22.1.1.
\editstart delete from Table 22-1: Standard Character Syntax Types
\+
\\ \cltxt
<tab> {em whitespace}
\\
<page> {em whitespace}
\\
<backspace> {em constituent}
\\
<return> {em whitespace}
\\
<rubout> {em constituent}
\\
<linefeed> {em whitespace}
\-
\editend
\setcounter{subsubsection}{1}
\subsubsection{Parsing of Numbers and Symbols} % 22.1.2.
\editstart delete from Table 22-3: Standard Constituent Character
Attributes
\+
\\ \cltxt
<backspace> {em illegal}
\\
<tab> {em illegal}
\\
<linefeed> {em illegal}
\\
<page> {em illegal}
\\
<return> {em illegal}
\\
<rubout> {em illegal}
\-
\editend
\setcounter{subsubsection}{3}
\subsubsection{Standard Dispatching Macro Character Syntax} % 22.1.4.
\editstart delete from Table 22-4: Standard \# Macro Character Syntax
\+
\\ \cltxt
\#<backspace> {em signals error}
\\
\#<tab> {em signals error}
\\
\#<linefeed> {em signals error}
\\
\#<page> {em signals error}
\\
\#<return> {em signals error}
\\
\#<rubout> {em undefined}
\-
\editend
\editstart ??? add
\+
\\ \cltxt
Table 22-4 and text. extended to include a construct for
extended character objects.
\-
\editend
\editstart 11 through 18 inclusive delete
\editend
\editstart 20 through 26 inclusive delete
\editend
\editstart 108 replace
\+
\\ \cltxt
{\clkwd \#<space>, \#<tab>, \#<newline>, \#<page>, \#<return>}
\-
\\ \bf with
\+
\\ \cltxt
{\clkwd \#<space>, \#<newline>}
\-
\editend
\setcounter{subsubsection}{4}
\subsubsection{The Readtable} % 22.1.5.
\editstart 3 replace
\+
\\ \cltxt
Even if an implementation supports characters with non-zero
{\em bits} and {\em font}
\\
attributes, it need not (but may) allow for such characters to
have syntax descriptions
\\
in the readtable. However, every character of type
{\clkwd string-char} must be
\\
represented in the readtable.
\-
\\ \bf with
\+
\\ \cltxt
Even if an implementation supports extended characters, it
need not
\\
(but may) allow for such characters to
have syntax descriptions
\\
in the readtable. However, every character of type
{\clkwd base-character} must be
\\
represented in the readtable.
\-
\editend
\setcounter{subsubsection}{5}
\subsubsection{What the Print Function Produces} % 22.1.6.
\editstart 13 replace
\+
\\ \cltxt
is used. For example, the printed representation of the character
\#$\backslash$A with control
\\
and meta bits on would be \#$\backslash${\clkwd CONTROL-META-A},
and that of
\\
\#$\backslash$a with control and meta bits on would be
\#$\backslash${\clkwd CONTROL-META-$\backslash$a}.
\-
\\ \bf with
\+
\\ \cltxt
is used.
\-
\editend
\setcounter{subsection}{2}
\subsection{Output Functions} % 22.3.
\setcounter{subsubsection}{0}
\subsubsection{Output to Character Streams} % 22.3.1.
\editstart 27 insert after
\+
\\ \cltxt
{\clkwd external-width} {\em object} \&{\clkwd optional}
{\em output-stream} [{\em Function}]
\\
\\
{\clkwd external-width} returns the number of host system base
\\
character units required for the object on the output-stream. If
not applicable to the output
\\
stream (For example, a display device
with proportional fonts), the function should return {\clkwd nil}.
\-
\editend
\setcounter{subsubsection}{2}
\subsubsection{Formatted Output to Character Streams} % 22.3.3.
\editstart 23 delete entire example
\+
\\ \cltxt
{\clkwd (format nil "Type} $\tilde{ }$
{\clkwd :C to $\tilde{ }$ :A."} . . .
\-
\editend
\editstart 66 replace
\+
\\ \cltxt
$\tilde{ }${\clkwd :C} spells out the names of the control bits and
represents non-printing
\\
characters by their names: {\clkwd Control-Meta-F, Control-Return,
Space}. This is a "pretty" format for printing characters.
\-
\\ \bf with
\+
\\ \cltxt
$\tilde{ }${\clkwd :C}
represents non-printing
\\
characters by their names: {\clkwd Newline,
Space}. This is a "pretty" format for printing characters.
\-
\editend
%----------------------------------------------------------------------
%----------------------------------------------------------------------
\setcounter{section}{22}
\section{File System Interface} % 23
\setcounter{subsection}{1}
\subsection{Opening and Closing Files} % 23.2.
\editstart 2 replace
\+
\\ \cltxt
{\clkwd open {\em filename} \&key :direction :element-type}
[{\em Function}]
\\
{\clkwd :if-exists :if-does-not-exist}
\-
\\ \bf with
\+
\\ \cltxt
{\clkwd open {\em filename} \&key :direction :element-type}
[{\em Function}]
\\
{\clkwd :external-code-format
:character-set}
\\
{\clkwd :if-exists :if-does-not-exist}
\-
\editend
\editstart 11 replace
\+
\\ \cltxt
{\clkwd string-char}
\\
The unit of transaction is a string-character. The functions
{\clkwd read-char}
\\
and/or {\clkwd write-char} may be used on the stream. This is
the default.
\-
\\ \bf with
\+
\\ \cltxt
{\clkwd base-character}
\\
The unit of transaction is a base-character. The functions
{\clkwd read-char}
\\
and/or {\clkwd write-char} may be used on the stream. This is
the default.
\-
\editend
\editstart 16 replace
\+
\\ \cltxt
{\clkwd character}
\\
The unit of transaction is any character, not just a string-character.
The functions
\-
\\ \bf with
\+
\\ \cltxt
{\clkwd character}
\\
The unit of transaction is any character.
The functions
\-
\editend
\editstart 19 insert after
\+
\\ \cltxt
{\clkwd :external-code-format}
\\
The
\-
\editend
\editstart 19 insert after
\+
\\ \cltxt
{\clkwd :character-set}
\\
The
\-
\editend
%----------------------------------------------------------------------
%----------------------------------------------------------------------
\begin{thebibliography}{wwwwwwww 99}
\bibitem[Ida87]{ida87} M. Ida, et al.,
{\em
JEIDA Common LISP Committee Proposal on Embedding Multi-Byte Characters
},
ANSI X3J13 document 87-022, (1987).
\bibitem[Linden87]{linden87} T. Linden,
{\em
Common LISP - Proposed Extensions for International Character Set
Handling
},
Version 01.11.87, IBM Corporation (1987).
\bibitem[Kerns87]{kerns87} R. Kerns,
{\em
Extended Characters in Common LISP
},
X3J13 Character Subcommittee document, Symbolics Inc (1987).
\bibitem[Steele84]{steele84} G. Steele Jr.,
{\em
Common LISP: the Language
},
Digital Press (1984).
\end{thebibliography}
\end{document} % End of document.
∂30-Jun-88 0823 CL-Characters-mailer latex document
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 30 Jun 88 08:23:38 PDT
Date: Thu, 30 Jun 88 08:08:34 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880630.080834.baggins@IBM.com>
Subject: latex document
Paul Beiser is having trouble getting the appendix to print. Is
anyone else having problems printing?
Regards,
Thom
∂14-Jul-88 1521 CL-Characters-mailer subcommittee document
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 14 Jul 88 15:21:31 PDT
Date: Thu, 14 Jul 88 15:11:03 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880714.151103.baggins@IBM.com>
Subject: subcommittee document
I haven't heard any comments on the preliminary proposal. Please
insure it is read and your comments/corrections are made in the
next two weeks. Remember, the first week in August is our
schedule for releasing it from subcommittee.
Regards,
Thom
∂20-Jul-88 1346 CL-Characters-mailer Forwarding comments from Paul.
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 20 Jul 88 13:45:50 PDT
Date: Wed, 20 Jul 88 13:24:31 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880720.132431.baggins@IBM.com>
Subject: Forwarding comments from Paul.
-------------------------------------------------------
I have several comments on the draft. First of all, typos. There should be
a "." after "[Steele84]" on page 1. On the same page, replace "Providing"
(under the first bulleted item) with "To provide" (to make it consistent
with the other bulleted items).
It really looks pretty good. Currently I have 4 people reviewing it
within HP (including someone in our Japan group), and their comments should
be back to me around Aug 1. I also forwarded a copy to Lucid to get
their reactions to it - after all, they will be implementing it eventually
for us!
Other comments.
*) Standard # Macro Character Syntax. I do not believe that there can be
a standard convention here UNLESS we have standard character set
identifiers. The proposal specifically avoids this (see footnote 4, pg
6). I do not see how the reader could read such a character unless
these character set identifiers were known to it. So, it looks as if
the only way to embed such a character would be with read-time evaluation
of functions CODE-CHAR or MAKE-CHAR, which leads me to ask: do these
functions need a :character-set option like OPEN does?
*) I think that sticking with simple-base-string only and eliminating
simple-string is good. However, I guess I could see the need for a
simple-extended-string type if we could guarantee that all extended
strings have the same "width" (that is, strings have either base characters,
in which the widths are known, or they have characters, in which the
width is known), but I do not think that this is part
of the proposal (well, at least I could not find it!). Should it be?
I guess I'm not sure.
*) I guess we need an EXTERNAL-WIDTH function if we have a multiplicity of
character widths. This leads to another question: can't we have just 2
widths and make them constants?? Like BASE-CHARACTER-WIDTH and
CHARACTER-WIDTH?
*) On page 20, you have "For example (characterp #\A :standard)". What are
the other allowable keyword arguments here? Is :base one of them? Are
all character set identifiers allowed here?
*) On page 21 you have "Every character has three attributes: code,
character-set, and character-set-index". Are there functions to return
character-set and character-set-index given a character? Are they
setf'able?
*) MAKE-STRING has a new argument, :element-type. What are the allowable
values here?
I guess more than anything, the draft lacks good examples to point out
questions people may have. If I'm confused, I think that is probably
the main reason. I think some of your footnotes have some good examples -
maybe we need to move those up into the text and make them fully part of
the proposal.
Another shortcoming is lack of an implementation. I think one of the strengths
of the CLOS and Error Signalling Proposals were that they had quite a
bit of experience with implementations and were able to realy understand
things because of that. I know that Symbolics has an implementation, but
unfortunately I am not familiar with it, nor do I have access.
I guess we'll have lots of work to do before and at the Oct meeting!
I will get the other comments and send them to you as soon as I get them.
Regards,
Paul
∂05-Aug-88 1154 CL-Characters-mailer comments on draft
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 5 Aug 88 11:54:07 PDT
Date: Fri, 05 Aug 88 10:24:20 PDT
From: Thom Linden <baggins@ibm.com>
To: Paul Beiser <paul%hpfclp.sde.hp.com@relay.cs.net>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880805.102420.baggins@IBM.com>
Subject: comments on draft
A few comments on your comments (in the same *'ed order you listed):
1) Standard # macro character syntax.....
This is a good point. In fact, I think bob kerns mentioned that
this is the place to use the ISO glyph identifiers listed on
p. 15.
ie. #\LA01 is equivalent to #\a .
This would be portable across systems.
Unfortunately this leads to a large set of such names when
considering the ,say, Kanji glyph set and a corresponding
performance burden. Thus the rational for
a form which is a encoding of the
glyphid. The form would allow an implementation
which can handle multiple glyph sets to provide this function
even in an environment (eg. file system) which does not.
A suggestion from LUCID was: #\name:xxxx where name
is the character [sub]repetoire name and xxxx is the index of
the character in hexidecimal. strings thus are printed
as #( #\name:xxxx #\name:yyyy ... )
Thus, for example, #\JIS:4F35 could be read into my lisp
implementation from a file it knows contains only standard-chars
and treated, say, as the Yen glyph.
At this point, I don't like this either. It seems to be
supporting a (hopefully) interm problem where the lisp
implementation has capabilities greater than it's environment.
Thus, I now think leaving things the way they are is correct.
ie. The only standardized 'named' glyphs are #\space and #\newline.
All others represent themselves. #\a represents LA01..etc. Of
course, the file can only contain characters allowed by
the files :character-set and :external-code-format values.
2) I think that sticking with simple-base-string.....
3) I guess we need an EXTERNAL-WIDTH function ...
The rational here is an implementation may have more than
two widths just as it may have more than one variety of
extended-character. For example, a Korean glyph set might
be kept in a 3 byte cell, a Kanji set in a 2 byte cell and
the base in 1 byte.
4) On page 20, you have "For example (characterp #\A :standard)"....
Any character [sub]repertiore name is allowed here. :standard is
the only one ANSI CL defines. Others could be unique
to an implementation but are more likely names like :ISO8859-1988
or :JISxxxxx. (see page 6 for some discussion of this).
5) On page 21 you have "Every character......
The code is currently not decomposable. There is the test above
for the character-set. I'm open to some function suggestions
to extract character-set or character-set-index.
eg. (char-character-set char) and
(char-character-set-index char)
They could be set'f able. I seem to recall some discussion of
this previously. Unfortunately, I don't recall the details (Larry?,
Bob?). I would guess a problem with portability of code using
any such functions. In Bob Kerns paper, he mentions (p 17)
implementations may dynamically load character-sets and
assign character codes on an as-need basis. In the IBM
proposal, we suggested char-split and char-join for decoding
and encoding respectively.
Another suggestion (via LUCID), was to replace char-split
with two functions:
(char-code-index char-code) which takes a character code
and returns the index and
(char-code-set char-code) which returns the character set.
6) Make-string has a new agument, :element-type.....
Right. The document fails to mention what :element-type
allows. I will amend it to say valid values are any
character type/subtype (eg. :element type '(character :standard))
Your final point on a lack of good examples is correct. Perhaps
you can get with Larry (who volunteered for examples!) and
formulate some for insertion into the doc. Any and all examples
from anyone are welcome!
------------------------------------------
I intend to update the document to reflect Pauls comments #4 and #6.
I will wait on #5 until a) people jog my memory on why not and
b) Paul makes a specific proposal. #1-3 I will leave as documented
currently if nobody objects strongly (ie. makes a specific proposal).
Also, I will change all the document references to 'deleted'
paragraphs of CLtL to include the first ten words of the paragraph.
This should ease the burden of the reader counting paragraphs.
Any additional changes (eg. example insertions) should be provided
in the next two weeks so we can make our end of month deadline
for distribution.
--------------------------------------------
Our voting time is here!
Note that possibility of further changes are becomming less likely
with the deadline comming near. Minor editorial changes
can be made throughout August but major suggestions are unlikely
to make the document (eg. rework this section, etc.) unless
you provide the work immediately!
So, by August 15, please place your vote on the subcommittee
forum on sail. The vote issue is: SHOULD THE DOCUMENT AS IT STANDS,
baring the updates mentioned above and any other minor editorial
changes, BE RELEASED TO X3J13 ON 31 AUGUST.
We are quite informal so YES votes with notations
are encouraged (eg. I vote YES but want the additions: xxx
changes: xxxx
deletions: xxxx
and have comments: xxxx)
NO votes MUST be accompanied with notations:
(eg. I vote NO but would vote YES if additions:xxxx
changes: xxxx
deletions: xxxx
and have comments: xxxx)
If a simple majority vote is in favor the document WILL BE RELEASED
and I will request subvotes on any addition,change,deletion
(sub)proposals which do not simply amplify the existing document.
If not, the document WILL NOT BE RELEASED unless another vote
is taken.
No vote is considered an ABSTENTION.
I should clarify that IF RELEASED, the document is still subject
to change by: suggestions/changes from X3J13 and by futher
amplification by our subcommittee (eg. at the October meeting). The
schedule I am following is:
31 August 88 ----- release document to x3j13
12 October 88 ----- discussion and vote by x3j13
mid November 88 ----- final modifications made per
X3J13 and subcommittee
mid+1 November 88 ----- document to editor
January 88 ----- ANSI Common LISP draft which
includes character extensions
--------------------------------------------
Gary and Bob, please make sure that Mike Beckerle sees a copy
of this note (as I don't believe he is connected yet).
--------------------------------------------
Any informal participants in this forum are also encouraged to
respond. votes will be encouraging if YES and discouraging if NO
but won't affect the tally. ---- comments and specific suggestions
are especially welcome!
--------------------------------------------
Sorry for the long message. I'm on vacation next week but will be
back on the 15th.
Regards,
Thom
∂16-Aug-88 0319 CL-Characters-mailer document vote
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 16 Aug 88 03:19:30 PDT
Date: Mon, 15 Aug 88 19:35:24 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880815.193524.baggins@IBM.com>
Subject: document vote
I'm back from vacation. Per my message from 5 August, I would
like to receive your votes on the document as distributed.
So far, your responses have not arrived. I'm expecting replys from:
Mike Beckerle
Paul Beiser
Bob Kerns
Kevin Layer
Larry Masinter
Gary Palter ?you joined recently, are you voting ?
Carl Hoffman ?haven't heard from you for quite a while,
are you voting ?
anyone else think they should be on this list? Also, comments
are invited.
Regards,
Thom
∂17-Aug-88 0432 CL-Characters-mailer document vote
Received: from ucbarpa.Berkeley.EDU by SAIL.Stanford.EDU with TCP; 17 Aug 88 04:32:02 PDT
Received: by ucbarpa.Berkeley.EDU (5.59/1.29)
id AA16311; Wed, 17 Aug 88 04:30:33 PDT
Received: by franz (3.2/3.14)
id AA19148; Wed, 17 Aug 88 04:01:50 PDT
Received: by feast (5.5/3.14)
id AA00254; Tue, 16 Aug 88 22:53:27 EDT
Date: Tue, 16 Aug 88 22:53:27 EDT
From: franz!feast!smh@ucbarpa.Berkeley.EDU (Steven M. Haflich)
Message-Id: <8808170253.AA00254@feast>
To: franz!ibm.com!baggins
Cc: franz!sail.stanford.edu!cl-characters
In-Reply-To: Thom Linden's message of Mon, 15 Aug 88 19:35:24 PDT <880815.193524.baggins@IBM.com>
Subject: document vote
(I've been tracking this list silently.)
FYI, Bob Kerns happens to be out of the country right now and won't be
back for about two weeks.
∂17-Aug-88 0749 CL-Characters-mailer forwarding Paul's message
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 17 Aug 88 07:49:19 PDT
Date: Tue, 16 Aug 88 14:04:23 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880816.140423.baggins@IBM.com>
Subject: forwarding Paul's message
------------------------------------------------------------
Received: from hpfclp.sde.hp.com by IBM.COM on 08/16/88 at 10:28:10 PDT
Received: from hpfclp.sde.hp.com (hpfclp) by hplabs.HP.COM with SMTP ; Tue, 16 Aug 88 09:27:11 PST
Received: from hpfcpsb.HP.COM by hpfclp.sde.hp.com; Tue, 16 Aug 88 11:25:49 mdt
Received: from hpfcpsb by hpfcpsb.HP.COM; Tue, 16 Aug 88 11:24:48 mdt
To: Thom Linden <baggins@ibm.com>
Subject: Re: Vote
X-Mailer: mh6.5
Date: Tue, 16 Aug 88 11:24:34 MDT
Message-Id: <3505.587755474@hpfcpsb>
From: paul@hpfclp.sde.hp.com
Thom,
Welcome back!
I vote YES, with:
*) we need more examples. I would suggest that someone with implementation
experience (maybe Bob Kerns or someone from Lucid) furnish some examples.
Regards,
Paul
P.S. I will be at AAAI Aug 20-26th, and then I am on vacation until Sept 8.
∂17-Aug-88 1851 CL-Characters-mailer request for comments to X3J13 subcommittee proposal
Received: from lucid.com by SAIL.Stanford.EDU with TCP; 17 Aug 88 18:51:31 PDT
Received: from rainbow-warrior ([192.9.200.16]) by heavens-gate id AA03609g; Wed, 17 Aug 88 15:35:34 PST
Received: by rainbow-warrior id AA27892g; Wed, 17 Aug 88 16:33:59 PDT
Date: Wed, 17 Aug 88 16:33:59 PDT
From: Dave Unietis <dru@lucid.com>
Message-Id: <8808172333.AA27892@rainbow-warrior>
To: cl-characters@sail.stanford.edu
Subject: request for comments to X3J13 subcommittee proposal
Cc: dru@lucid.com
By way of introduction, I work at Lucid Inc., where I am involved in
adding DBCS character support to Lucid Common Lisp.
The following is our response to a request for comments on the latest
draft of the X3J13 character subcommittee proposal.
Although these comments are quite lengthy, and do raise several issues that
we feel merit further examination, I should say up front that we are in
general agreement with most of the substance of the current proposal draft,
and appreciate the effort of Thom Linden and the character subcommittee
towards this standardization effort.
In rough order of importance:
Simple-strings
The type simple-string should not be eliminated. One of the tenets
of the JEIDA proposal, reinforced in our discussions with the Japanese,
is that existing programs that work with characters, string-chars, and
strings should continue to work unmodified with extended characters and
extended strings. We feel this design consideration to be primary.
In Lucid Common Lisp, SCHAR is used by most existing programs that manipulate
strings, because most of the time strings don't require fill pointers, etc.,
and because SCHAR is optimized by the compiler. With the proposed elimination
of simple strings, and redefinition of SCHAR to work only with
simple-base-strings, these programs will have to be recoded to work
with strings containing other than base characters.
I don't understand why simple strings are considered "ambiguous", as suggested
by the cover letter. A simple string is precisely a string that does not have
a fill pointer, is not displaced to another string, and may not have its
size adjusted dynamically after creation. Simple strings are no more ambiguous
than simple arrays of type T - how the data type is implemented internally is
irrelevant.
I propose retaining the current definition of simple-string and SCHAR, and
adding a new simple-base-string accessor, SBCHAR, which is defined to operate
on simple-base-strings only. Someone making use of such a function would
be explicitly specifying that the string in question contains only base
characters. The resulting type hierarchy more closely parallels the one
defined in the JEIDA proposal.
Most-general-strings
Given that the type string-char is equivalent to the type character in the
subcommittee document proposal, and given that the type string is defined as
(vector string-char), and the type most-general-string is defined as
(vector character) (A.2.15, p19), then why aren't the types string and
most-general-string equivalent? If they are equivalent, then as a type
definition, most-general-string is redundant. If I'm guessing correctly the
intent of the definition of most-general-string, it is to provide a
declaration that indicates that the string in use is not a
base-character-only ("thin") string. We wrestled with the problem of
providing an adequate definition of such a type, and came to the conclusion
that the increase in performance such a data type might provide
did not warrant adding more hair to the array-type gorilla.
Equivalence classes
Our discussions with Japan indicate that this issue is not going to go away.
In fact, the next draft of the JEIDA proposal, due next month, is rumored
to have recommendations regarding treatment of double-byte "alphabetic"
(i.e. English) characters.
I agree that defining dynamically-modifiable equivalence classes has
serious flaws, even if the equivalence state is rebindable, among which are
that symbol EQ-ness is not preserved, and that hash keys may be invalidated.
However, if a character's equivalence class is treated as a static property,
these problems disappear. That is, a character's equivalence class is defined
to be a property similar to whether or not the character is a graphics, digit,
or uppercase character. The process of character canonicalization,
as described in Linden 87, seems no more arbitrary than the current
case-conversion by the reader and case-insensitivity of some of the string
and character predicates.
I feel this mechanism should be retained and that equivalence classes should be
defined statically as properties of the character set(s) supported by
an implementation.
Character code components, character attributes
The latest draft of the proposal seems to be heading in the right
direction, where it states "The convention by which the character set
index and character set identifier are composed into a single integer code
is implementation dependent." However, I feel it doesn't go far enough.
Given that the information from a character's character-set and
character-set-index is captured in its character code, then it is
unnecessary to elevate these properties to the level of attributes, as
described in A.13.1 (p 21). The character set and index of a character
are simply properties, just as whether or not the character is a digit, etc.
are properties. Given all this, it is unnecessary to define functions to
extract or set these "components" of a character. As a matter of fact,
I'm not sure what meaning character-set-index has as a Common Lisp
construct. It is not mentioned in any of the other function definitions
in Appendix A. I agree that an implementation would be wise to
document the mapping from a character's external representation to its
character code, but other that than I don't see what else is necessary.
EXTERNAL-WIDTH, and FORMAT
I feel that an EXTERNAL-WIDTH function (WRITE-WIDTH in the JEIDA proposal)
is necessary. It is easy to try and write this one off as not part of the
language definition, but I think we are blinded by the fact that in
most popular English-only character sets it is always true that
(= width-in-characters width-in-external-code-format-units), and that
the difficulties when this is not the case are not properly appreciated.
For example, there is a problem in the way that FORMAT currently interprets
numeric parameters to directives. Our original plan was to interpret such
parameters as meaning number of characters, which would require no change to
the language definition. The Japanese have convinced us that it is far more
useful to define numeric parameters as meaning the number of bytes required in
the external code format associated with the stream argument to FORMAT. This
allows these directive parameters to be used in producing columnar output, as
long as the width in bytes of the external code format corresponds to the
resulting width of the displayed or printed output, which seems to be the
usual case. At first, we were reluctant to consider introducing an "external"
meaning to an "internal" function such as FORMAT, but after further
consideration, we decided that FORMAT is the appropriate place for this type
of processing.
There is a problem, however, is deciding what to do when NIL is specified as
the stream argument to FORMAT, particularly when used to produce a string
that will in turn be passed as an argument to a subsequent FORMAT.
Also, it would be useful to be able to specify that numeric parameters be
interpreted as number of characters, regardless of the destination stream
argument. Rather than clutter up the already-tortured definition of FORMAT,
we suggest adding the following variable:
*FORMAT-EXTERNAL-WIDTH* - specifies how numeric parameters in a format
control string are interpreted. It can have one of the following values:
T With this value, FORMAT uses the destination stream type
to interpret numeric parameters as external format units
for this type of stream; if the destination stream type
is NIL, numeric parameters are interpreted as characters.
This value is the default.
NIL With this value, FORMAT interprets numeric parameters as
characters, regardless of the destination stream type.
external If the value is a keyword that specifies an external code
format format recognized by the implementation, FORMAT
interprets numeric parameters as external format units
when the destination stream is NIL. If the destination
stream type is non-NIL, this value has no effect.
Note that for streams of only base-characters, width in characters = width in
external format units, and the values T and NIL above are equivalent.
Printing characters
The main problem here seems to be to decide what to do when extended
characters are written to a base-character only stream, as existing
mechanisms are sufficient for unrestricted streams.
In escape mode, characters other than base-characters that are written to
a base-character-only stream could be written using an extended definition
of char-name, like the one used by Lucid Common Lisp described below.
This probably isn't general enough to warrant inclusion in the language,
however, except perhaps to note that all characters may be printed to
any stream when in escape mode in some implementation-dependent manner.
In non-escape mode, the problem is more difficult. Given the problems
in developing a general character-by-character encoding with escape
characters, as suggested earlier by Larry Masinter, I think the right thing
to do here is just punt and say "It is an error" to write extended
characters to a base-character-only stream in non-escape mode.
In current implementations of Lucid Common Lisp, all characters may be read
in the following form: #\cxx, where xx is the character code in hexadecimal.
Cxx is the char-name for all non-printable characters that do not have a more
mnemonic name, and is used when printing these characters in escape mode.
This mechanism could be extended to extended characters by simply adding
hexadecimal digits. Extended characters could then be read from and
written to base-character-only streams using this syntax. I don't feel
that including the character set name in the syntax is necessary, as this
information is not explicitly retained when characters are written to
unrestricted streams.
Storing extended characters in base strings.
I'm assuming that "this is an error"; if so it needs to be noted
in the appropriate places in CLtL (setf of sbchar, replace, etc.)
Glyphs and repertoires
I guess I just don't understand what this is all about. After several
readings of the relevant sections of the proposal, I think I understand the
glyph/character and character set/character repertoire abstractions, but it
still strikes me as much ado about nothing. Of course it is possible
for display devices, printers, keyboards, operating systems and window
systems to display, print, input, and/or translate characters in any manner
whatsoever, but what of any of this has anything to do with Common Lisp?
As for Common Lisp itself, of course it shares the same freedom as any other
piece of hardware or software in this regard, and if an implementation chooses
it could interpret the glyph "a" typed by a user as LZ01, I suppose.
At most it seems that the following fact is worth noting:
"An implementation may choose to document idiosyncrasies of the way
some characters are mapped from I/O devices to internal 'graphics symbols'
and still call itself Common Lisp." (the inability of most IBM terminals
to print the character [ comes to mind).
Certainly nobody is proposing that the glyphs used throughout the
definitions of "all Common Lisp functions, macros, constants, and global
variables" in CLtL be replaced with the corresponding character IDs
from the table of A.2.2.1 (p14). Of course not, because the glyphs used in
examples in the language definition as well as the glyphs used in any
reasonable implementation had better correspond pretty closely to
the standard glyphs in the table in A.2.2.1.
On the other hand, maybe I just don't understand the issues here. If so,
I don't think I'll be alone in this regard, so perhaps more motivating
arguments for the introduction of this terminology should be added to the
documentation.
JEIDA proposal
As I mentioned earlier, I believe that a new draft of the JEIDA proposal
is due sometime in September. Are there plans for including input from this
source in the final draft that is presented to X3J13 in October?
Typos
Although the definitions have been dropped from this draft of the proposal,
the terms "extended string" and "code point" still occur in several places.
David Unietis
Lucid, Inc.
∂14-Sep-88 1206 CL-Characters-mailer DC meeting
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 14 Sep 88 12:05:53 PDT
Date: Wed, 14 Sep 88 11:40:06 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880914.114006.baggins@IBM.com>
Subject: DC meeting
The results of our voting were:
Linden -- yes
Masinter -- no vote received
Beckerle -- yes
Kerns -- no vote received (told he is unavailable in Japan)
Beiser -- yes
Layer -- yes
Thus, we will distribute the document to X3J13. I am finishing
some editorial modifications and will incorporate many of the
comments received (these will be discussed
in a separate note and at DC).
----------------------------------------
I discovered Carl Hoffman is no longer at ILA .. since he
hasn't been active on the subcommittee, and hasn't seen the
proposal (to my knowledge) I won't list him on the front.
----------------------------------------
We also received significant comments from LUCID, and general
agreement with the proposal.
----------------------------------------
I would like to hold an all day meeting on Monday 10 Oct prior
to the next X3J13 meeting. I'll get back with the location and
precise times (circa 9 to 5).
The subject of the meeting is to discuss any and all points on
the CS proposal.
Regards,
Thom
∂14-Sep-88 1403 CL-Characters-mailer DC meeting
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 14 Sep 88 14:02:52 PDT
Date: Wed, 14 Sep 88 12:02:37 PDT
From: Thom Linden <baggins@ibm.com>
To: "Robert F. Mathis" <mathis@a.isi.edu>
cc: Jan Zubkoff <edsel!jlz@labrea.stanford.edu>,
"X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880914.120237.baggins@IBM.com>
Subject: DC meeting
Bob/Jan,
The Characters subcommittee will be meeting on Monday 10 Oct.
We need a room from 9:30 to 5pm for 4 to 6 attendees.
We also have a proposal to submit to the full committee. I hope to
have the final revisions completed today. Please let me know how
you would like it distributed (vnet to you, mail to you, both?)
It is written using only a few facilities of LaTex (esp. tabular)
If you send me a mailing list, I would also be able to
distribute printed copies (as early as tomorrow).
I would expect this proposal to take at least 3 hours of full
committee time (with about 1 hr subcommittee review at the start).
I wish to have a position vote by the full committee at the DC
DC meeting on the following:
1a) accept, future revisions to be handled by editorial subcommittee
or 1b) accept, with direction for revisions in specific sections,
specific sections to be revised by the characters subcommittee
2) submit (stipulating 1a or 1b) to ISO at their November meeting
Regards,
Thom
∂18-Sep-88 1649 CL-Characters-mailer DC meeting
Received: from AI.AI.MIT.EDU by SAIL.Stanford.EDU with TCP; 18 Sep 88 16:49:20 PDT
Date: Sun, 18 Sep 88 19:54:41 EDT
From: "Robert W. Kerns" <RWK@AI.AI.MIT.EDU>
Subject: DC meeting
To: baggins@IBM.COM
cc: cl-characters@SAIL.STANFORD.EDU
Message-ID: <445848.880918.RWK@AI.AI.MIT.EDU>
Date: Wed, 14 Sep 88 11:40:06 PDT
From: Thom Linden <baggins at ibm.com>
The results of our voting were:
Linden -- yes
Masinter -- no vote received
Beckerle -- yes
Kerns -- no vote received (told he is unavailable in Japan)
Here I am. Actually, I've been back for about three weeks, but it's
taken a while to get my modem hooked up again, since I moved my Mac.
I plan to make other arrangements for mail shortly, anyway.
If someone could please send me a copy of the document, as either Ascii text or
Microsoft word format, on either IBM 360K or 1.2M 5.25" floppies or Mac
floppies, I'll see about getting you comments ASAP. Thanks.
Beiser -- yes
Layer -- yes
Thus, we will distribute the document to X3J13. I am finishing
some editorial modifications and will incorporate many of the
comments received (these will be discussed
in a separate note and at DC).
----------------------------------------
I discovered Carl Hoffman is no longer at ILA .. since he
hasn't been active on the subcommittee, and hasn't seen the
proposal (to my knowledge) I won't list him on the front.
He spends most of his time in Japan these days, but is here in
the US at the moment. If you'll get me a copy I'll get a copy to
him, if he's still interested.
----------------------------------------
We also received significant comments from LUCID, and general
agreement with the proposal.
----------------------------------------
I would like to hold an all day meeting on Monday 10 Oct prior
to the next X3J13 meeting. I'll get back with the location and
precise times (circa 9 to 5).
The subject of the meeting is to discuss any and all points on
the CS proposal.
Regards,
Thom
∂23-Sep-88 1038 CL-Characters-mailer october meeting note
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 23 Sep 88 10:38:41 PDT
Date: Fri, 23 Sep 88 09:53:06 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880923.095306.baggins@IBM.com>
Subject: october meeting note
I'm mailing the following note along with the proposal today. I'll
also send the LaTex form of the proposal to cl-characters (split
into parts due to postal problems).
Regards,
Thom
-----------------------------------------------------------------
The Characters subcommittee proposal for extending Common LISP
to support multiple and large character sets is a topic for
discussion and vote at the Washington D.C. meeting in October.
I have included a copy of the proposal for your review. I would
encourage editorial comments and minor corrections be sent directly
to me at the address above or via
csnet to cl-characters@sail.stanford.edu. Other review comments
may be sent to common-lisp@sail.stanford.edu or stated at the
October meeting.
The characters subcommittee is requesting the following
position votes by X3J13 at the Washington D.C. meeting:
1a) Accept for inclusion in the draft standard.
1b) Accept conditionally with
specific revisions to be incorporated by the characters subcommittee.
2) Submit the proposal (stipulating 1a or 1b) to
ISO WG16 at their November meeting.
∂23-Sep-88 1040 CL-Characters-mailer proposal part 1
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 23 Sep 88 10:38:56 PDT
Date: Fri, 23 Sep 88 09:55:21 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880923.095521.baggins@IBM.com>
Subject: proposal part 1
\documentstyle{report} % Specifies the document style.
\pagestyle{headings}
\title{\bf DRAFT:
Extensions to Common LISP to Support International
Character Sets}
\author{
Michael Beckerle\thanks{Gold Hill Computers} \and
Paul Beiser\thanks{Hewlett-Packard} \and
Robert Kerns\thanks{Independent consultant} \and
Kevin Layer\thanks{Franz, Inc.} \and
Thom Linden\thanks{IBM Research, Subcommittee Chair} \and
Larry Masinter\thanks{XEROX Research}
}
\date{Sept 9, 1988} % Deleting this command produces today's date.
\begin{document}
\maketitle % Produces the title.
\setcounter{secnumdepth}{4}
\setcounter{tocdepth}{4}
\tableofcontents
%----------------------------------------------------------------------
%----------------------------------------------------------------------
\newfont{\cltxt}{cmr10}
\newfont{\clkwd}{cmtt10}
\newcommand{\apostrophe}{\clkwd '}
\newcommand{\bq}{\clkwd\symbol{'22}}
%----------------------------------------------------------------------
%----------------------------------------------------------------------
\chapter{Introduction}
This is a proposal for both extending and modifying the Common LISP
language definition to provide a standard basis for Common LISP
support of the variety of character sets used to represent the
native languages of the international community.
This proposal was created by the Character Subcommittee of X3 J13.
We would like to acknowledge discussions with T. Yuasa and other
members of the JEIDA Technical Working Group,
comments on earlier versions of this proposal by David Unietis at
LUCID Inc.,
the JEIDA proposal \cite{ida87}
as well as the
proposals \cite{linden87} and \cite{kerns87} for
providing the initial motivation and direction for these extensions.
As all these documents and discussions were
expressly for Common LISP standardization usage,
we have borrowed freely from their ideas as well as the texts
themselves.
This document is separated into two parts. The first part explains the
major language changes and their motivations. The second part,
Appendix A, provides
the page by page set of editorial changes to \cite{steele84}.
\section{Objectives}
The major objectives of this proposal are:
\begin{itemize}
\item To provide a consistent, well-defined scheme allowing support
of both very large character sets and multiple character sets.
Many native
languages, such as Japanese and Chinese, use character
sets which contain more characters than the Roman alphabet.
Supporting larger sized character sets frequently means employing
larger data fields to uniquely encode each
character.
Common LISP implementations using
larger sized character sets
can
incur performance penalties in terms
of space, time, or both.
Many software applications are intended for international use, or
have requirements for incorporation of language elements of multiple
native
languages within a single application.
In order
to ensure some portability of these applications, data expressed in
a mixture of
native
languages must be treated consistently by the
software language.
\item To ensure efficient performance of string and character
operations.
The use of large and/or multiple character sets by an implementation
implies the need for a more complex character type representation.
Given a more complex character representation, the efficiency
of language operations on characters (e.g. string operations)
could be affected.
\item To assure forward compatibility of the proposed model
and definition with existing Common LISP implementations.
Developers should not be required to re-write large amounts of either
LISP code or data representations in order to apply the proposed
changes to existing implementations.
The proposed changes should provide an easy
portability path for existing code to many possible implementations.
\end{itemize}
%----------------------------------------------------------------------
%----------------------------------------------------------------------
%----------------------------------------------------------------------
%----------------------------------------------------------------------
\chapter{Overview}
We use several terms within this document which
are new in the context of Common LISP.
Definitions for the following prominent
terms are provided for the reader's convenience.
A {\em character repertoire} defines a collection of characters
independent of their specific rendered image or font. Character
repertoires are specified independent of coding and their characters
are only identified with a unique label, a graphic symbol, and
a character description.
Once defined, a character repertoire must be
{\em encoded} to allow a one-to-one mapping between a character
and a number that serves as the character code. An encoded repertoire
is called a {\em coded character set}.
In Common LISP a {\em character} data object is identified by its
{\em character code}, a unique numerical code identification.
Each character code is composed from
a {\em character set identifier},
shared by all characters of a particular character
set, and a {\em character set index}, a numerical identification which
is unique within a particular character set.
Character data objects which are classified as {\em graphic},
or displayable, are each associated with a {\em glyph}. The
glyph is the visual representation of the character.
The primary purpose of introducing these terms is to provide a
consistent naming to Common LISP concepts which are identical
to those found in ISO standardization of coded
character sets. They also serve as a demarkation between these
standardization activities. For example, while Common LISP is free to
define unique repertoires and facilities to manipulate them, it should
not define character encodings.
%----------------------------------------------------------------------
\section{Character Identity}
Characters are uniquely distinguished by their codes,
which are drawn from the set of
non-negative integers.
It is important to separate the notion of glyph from the notion of
character data object when defining a scheme under which issues of
identity can be rigorously decided by a computer language. Glyphs are
the visual aspects of characters, writable on surfaces, and sometimes
called 'graphics'. A language specification valid for more than a
narrow range of systems can only make assumptions about the existence
of {\em abstract} glyphs (for example, the Latin letter A) and not about
glyph variants (for example, the italicized Latin letter {\em A})
\footnote{the later are often referred to as {\em designer} glyphs}
or characteristics of display devices. Thus, a key element of this
proposal is the removal of the {\em font} and {\em bits}
attributes from the language specification.\footnote{These and other
attributes may still be supported by an implementation but they
are extensions which do not affect the
{\clkwd char-equal} identity of the character
object.}
Character codes are composed from a character set identifier and a
character set index.
Within a given character set, individual member
characters are distinguished by character set index.
\footnote{
We specifically do not propose any standard encoding for
any character repertoires.
}
An implementation need
not support more than one character set, the {\em base} character set.
If it does support multiple
character sets, it must define the sets supported and
their characteristics. Character set identifiers are assigned to
character sets by the implementation.
\footnote{
We do not propose any standard character set
identifiers but names such as {\clkwd :ISO8859-1988} come to mind.}
The convention by which the character set index
and character set identifier are composed into a single integer code
is implementation dependent.
Characters within the base character set are referred to as
{\em base characters}. Characters not in the base character set
are referred to as {\em extended characters}.
One ramification is that the distinction between {\clkwd string-char}
and {\clkwd character} is eliminated. {\bf All} characters can be
inserted into (type compatible) strings.
For compatibility, {\clkwd string-char}
is defined as equivalent to {\clkwd character}. All functions
dealing with the {\em bits} and {\em font} attributes are either
removed or modified by this proposal.
A second ramification
is that the {\clkwd characterp} predicate is extended to
support testing
membership of a character in a given character repertoire
or subrepertoire.
\footnote{
For example,
testing membership in the Kanji subrepertoire.
}
A third ramification is that I/O functions must be modified to manage
the interaction between the Common LISP treatment of character sets and
the external environment.
The
intent of the provision for multiple character sets
is that
native
language glyph sets (with associated digits and
punctuation)
\footnote{For example, the glyphs on the keycaps of a particular
terminal, or any other glyph sets with a common use in graphics or
symbolic communication.
}
supported by user display
hardware should each be mapped by the I/O interface
into its own character set inside
LISP, all the members of which
share a common character set identifier.
\footnote{Of course, an implementation would be free to decide if and
how supported glyphs should be differentiated into sets.
}
Which glyph sets are supported by the overall computing system, the
details of the mapping of
glyphs to character set indices, and the particular character set
identifiers used, are left unspecified by Common LISP.
The diversity of glyph sets and character
encoding conventions in use worldwide and the desirability
of allowing LISP to manipulate symbolic elements from many
languages, perhaps simultaneously, mandate such a flexible approach.
%----------------------------------------------------------------------
\section{Hierarchy of Types}
A Common LISP
implementation is required to support at least one character
repertoire: the {\em base character repertoire}.
The base character repertoire
is distinguished from every other supported character repertoire in
several respects:
\begin{itemize}
\item
The standard characters are a subrepertoire of the base characters.
\item
Only members of the base character repertoire
can be elements of a base string.
\item
The base characters are, in general, the default characters for I/O
operations.
\end{itemize}
No upper bound is specified for the number of glyphs in the base
character repertoire--that
is implementation dependent. The lower bound is 96, the
number of standard characters defined for Common LISP.
We use the term {\em extended} to describe character repertoires beyond
the base repertoire.
The following type specifier is added as a subtype
of {\clkwd character}.
\begin{itemize}
\item {\clkwd base-character}
\end{itemize}
An implementation may support additional subtypes of {\clkwd character}
which may or may not be supertypes of {\clkwd base-character}.
The distinction of a base character set is largely a pragmatic
choice. It permits efficient handling of common situations, is
in some sense privileged for host system I/O, and can serve as an
intermediate basis for portability, less general than the standard
characters, but possibly more useful across a narrower range of
implementations.
Most computers have some "natural" character representation which
is a function of hardware instructions for dealing with characters,
as well as the organization of the file system. The natural character
representation is likely to be the smallest transaction unit permitted
for text file and terminal I/O operations. On a system with a record
based I/O paradigm, the natural character representation is likely to
be the smallest record quantum. On many computer systems,
this representation is a byte.
However, there are often multiple character sets supportable on a
computer, through the use of special display and entry hardware, which
are varying interpretations of the basic system character
representation. For example, EBCDIC and extended ASCII are two
different interpretations of the same 1-byte code representations.
Many countries have their own glyph-to-code mappings for 1-byte
character codes addressing the special requirements of national
languages. Differentiating between these sets, without reference to
display hardware, is a matter of convention, since they all use the
same set of code representations. When a single byte is not enough,
two or more bytes are sometimes used for character encoding. This
makes character handling even more difficult on machines where the
natural representation size is a byte, since not only is the semantic
value of a character code a matter of convention, which may vary
within the same computing system, but so is the identification of a
set of bits as a complete character code.
It is the intention of this proposal that the base character set of
Common LISP
be the natural characters of the host system: its composition
should be
determined by the code capacity of the natural file system and I/O
transaction representations, and its assumed display glyphs should be
those of the terminals most commonly employed.
There are several advantages to this scheme. Internal representation
of strings of just base characters can be more compact than
strings including extended characters.
Source programs are likely to consist predominantly of base characters
since the standard characters are a subset of the base character
repertoire. Parsing of pure base character text
can be more efficient than parsing of text including
extended characters.
I/O can be performed more simply
with base characters,
and they can be used as a basis for data representations to
be shared with other LISP sessions with potentially different
character set definitions or non-LISP processes.
{\em Implementation note}:
Although the readtable must be capable of
holding syntax information for all characters, the data
structure(s) used internally for the readtable may be segmented
into a section for each defined character set. Access for
base character syntax during the parsing of base strings may
be quicker than the general case since the table section is the
same for all component characters, and entries may be accessed
directly by character set index.
The standard characters are the 96 characters used in the Common LISP
definition {\bf or their equivalents}.
This was the Common LISP \cite{steele84} definition, but
{\em equivalents} is a vague term.
The standard characters are not defined by their glyphs, but by their
roles within the language. There are two aspects to the roles of the
standard characters: one is their role in reader and format control
string syntax; the second is their role as components of the names of
all Common LISP
functions, macros, constants, and global variables. As
long as an implementation chooses 96 characters
and treats those 96 in a manner consistent with
the language's specification for the standard characters (e.g.
the naming of functions), it doesn't matter what glyphs the I/O
hardware uses to represent those characters: they are the standard
characters. Any program or
data text written wholly in those characters
is portable through simple code conversion.
A mechanism, such as in \cite{linden87}, which supports establishment of
equivalency between distinct characters is not excluded by
of this proposal.
\footnote{But, as with the font character attribute,
is not a mechanism standardized by the ANSI Common LISP definition.}
In general, the authors of this proposal favor the alternative
of ISO standardization of non-overlapping
coded character sets.\footnote{Given the difficulties inherent in the
international standardization process, this may not be a
realistic alternative.}
The {\clkwd string} type
is defined as
a vector of characters. More precisely, a string
is a specialized vector whose elements are of type
{\clkwd character} or a subtype of character. There are three strings
distinguished with standardized names: {\em base-string},
{\em most-general-string}, and {\em simple-base-string}.
All strings which are not base strings
are referred to as {\em extended strings}.
A base string can only contain base characters. A
{\clkwd most-general-string}
can contain any implementation supported base or extended characters,
in any mixture.
All Common LISP functions defined to operate on strings operate
consistently on base strings and extended strings with the following
caveat: for any function which inserts a character into a string, it
is an error to insert an extended character
into a base string.
An implementation may support string subtypes more general
than {\clkwd base-string} but more specialized than
{\clkwd most-general-string}.
For example, a hypothetical
implementation supporting Korean and Russian repetoires
might provide:
\begin{itemize}
\item {\clkwd most-general-string} -- may contain Korean, Cyrillic or
base characters in any mixture.
\item {\clkwd region-specialized-string} -- may contain installation
selected repetoire (Korean/Cyrillic) or base characters in any
mixture.
\item {\clkwd base-string} -- may contain base characters
\end{itemize}
Though, clearly, portability of applications using
{\clkwd region-specialized-string} is limited, a performance
advantage might argue for its use.
Alternatively,
an implementation may define {\clkwd most-general-string}
as equivalent to {\clkwd base-string} and {\clkwd base-character}
as equivalent to {\clkwd character} in a host environment
supporting a large base character repetoire
including, say, Korean, Cyrillic and Latin
subrepetoires.
The {\clkwd coerce} function is extended to
allow for explicit coercion between base strings and extended strings.
During reader
construction of symbols, if all the characters
in the symbol's name are of type {\clkwd base-character},
then the name of the symbol will be stored as a base string.
Otherwise it will be stored as an extended string.
The base string type allows for more compact representation of strings
of base characters, which are likely to predominate in any system.
Note that in any particular implementation the base character set
need not be the
most compactly representable character set, since another might have
a smaller repetoire.
However, in most implementations base strings are
likely to be more space efficient than extended strings.
It has been suggested that either a single string type is
sufficient for large character set Common LISP implementations,
or that a hierarchy of string types could be used, in a manner
transparent to the user. A desire to flexibly support many different
character sets without compromising the efficiency of ordinary
applications led us to accept the need for more than one string type.
We believe that these choices reflect a minimal
modification of this aspect of the type system, and that
exposing the string types for user programs to negotiate in their own
way is the most reasonable approach.
%----------------------------------------------------------------------
\section{Streams and System I/O}
A lot of the work of ensuring that a
Common LISP implementation operates
correctly in a multiple character set environment must be performed by
the I/O interface.
The system I/O interface, abstracted in
Common LISP as streams, is responsible
for ensuring that text input from outside LISP is properly mapped
into character sets internally, and that the inverse mapping
\footnote{Such an inverse may not exist.
An implementation might legally fold multiple
external character sets into a single internal set on input
(e.g. EBCDIC and ASCII).
}
is performed on output. It is beyond the scope of a language
definition to specify the details of this operation, but options
are specified which allow runtime indication from the user as to
what character sets a stream uses, and how the mappings
should be done. It is expected that implementations will provide
reasonable defaults and invocation options to accommodate desired use
at an installation.
Two keyword arguments are proposed as additions to {\clkwd open}:
\begin{itemize}
\item {\clkwd :character-set}
whose value would be:
\begin{itemize}
\item A name or list of names of
defined character sets in the form of keywords.
The default is the base character set when
{\clkwd :external-code-format} is also defaulted. If a non-default
value is specified for {\clkwd :external-code-format}, there may be a
different default for {\clkwd :character-set}.
\end{itemize}
\item {\clkwd :external-code-format}
whose value would be:
\begin{itemize}
\item
A keyword indicating an implementation recognized scheme for
representing 1 or more character sets with non-homogeneous codes.
\footnote{
For example, the SO/SI SBCS/DBCS convention used by IBM on 370
machines could be selected by a keyword like
{\clkwd :shift-delimited}.
The compact run-encoding convention defined by XEROX could be
selected by {\clkwd :run-encoded}.
The SBCS/DBCS convention based on
ASCII which uses leading bit patterns to distinguish two-byte codes
from one-byte codes could be selected by a keyword like
{\clkwd :high-byte-delimited}.
}
The default is the natural system character representation,
the base character representation.
As many {\clkwd :character-set} names must be provided as the
implementation requires for that external coding convention.
\footnote{
For example, if {\clkwd :shift-delimited} were the
{\clkwd :external-code-format} argument, two character set specifiers
would have to be provided.
}
\end{itemize}
\end{itemize}
These arguments are provided for input, output, and
bidirectional streams. All characters read from the streams will be
members of the character sets specified by the {\clkwd :character-set}
argument. It is an error to try to write a character other than a
member of
the specified sets to a stream. (This includes the
\#$\backslash${\clkwd Newline} character.
Implementations should provide for appropriate line division behavior
through the function {\clkwd terpri}.)
An implementation supporting multiple character sets
must allow for the external and
internal representation of characters to be separately (and perhaps
multiply) specified to {\clkwd open},
since there can be circumstances under
which more than one external representation for an internal character
set is in use, or more than one character set is mixed together in an
external representation convention.
In addition to supporting conversion at the system interface, the
language must allow user programs to determine how much space data
objects will require when output in whichever external representations
are available.
The new function {\clkwd external-width} takes a character object
or string as its required argument. It also takes an optional
{\em output-stream}.
It returns the number of host system character
representation quantum units
\footnote{
Same as the storage width of a base character, usually a byte.
}
required to externally store that object, using the indicated
representation convention. If the item cannot be represented in
that convention, the function returns {\clkwd nil}.
This function is necessary
to determine if internal strings can be written to fixed length
fields in databases or terminal screen templates. Note that this
function addresses the problem of storage width, and does not
address the problem of display width, which may involve calculating
screen width of strings printed in proportional fonts.
A new global variable {\clkwd *format-external-width*} is
introduced to direct
the {\clkwd format} function to
take the {\clkwd external-code-format} of the associated
stream argument into account. This allows the directive parameters
to be used in producing columnar output, as long as the width
in bytes of the external code format corresponds to the
resulting width of the displayed or printed output.
%----------------------------------------------------------------------
∂23-Sep-88 1044 CL-Characters-mailer proposal part 2
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 23 Sep 88 10:40:28 PDT
Date: Fri, 23 Sep 88 09:56:04 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880923.095604.baggins@IBM.com>
Subject: proposal part 2
%----------------------------------------------------------------------
\newcommand{\edithead}{\begin{tabular}{l p{3.95in}}
\multicolumn{2}{l} }
\newcommand{\csdag}{\bf$\Rightarrow$\ddag}
\newcommand{\editstart}{}
\newcommand{\editend}{\\ & \end{tabular}}
%----------------------------------------------------------------------
%----------------------------------------------------------------------
\appendix
\chapter{Editorial Modifications to CLtL}
The following sections specify the editorial changes needed in
CLtL to support the proposal. Section/subsection numbers and titles
match those found in \cite{steele84}. The notation
{\csdag x} denotes a reference to paragraph x within the
subsection (we count each individual example or metastatement
as 1 paragraph of text). When an entire paragraph is deleted,
the first few words of the paragraph is noted as an aid in
identifying the text location.
%----------------------------------------------------------------------
\setcounter{section}{1}
\section{Data Types} % 2
%----------------------------------------------------------------------
\edithead {\csdag 8}
\editstart
\\ \bf replace &
\cltxt
rich character set, including ways to represent characters of various
type styles.
\\ \bf with &
\cltxt
rich character repertoire.
\editend
\setcounter{subsection}{1}
\subsection{Characters} % 2.2.
\edithead {\csdag 1}
\editstart
\\ \bf replace &
\cltxt
Characters are represented as data objects of type {\clkwd character}.
There are two subtypes of interest, called
{\clkwd standard-char} and {\clkwd string-char}.
\\ \bf with &
\cltxt
Characters are represented as data objects of type
{\clkwd character}.
\editend
\\
\edithead {\csdag 2}
\editstart
\\ \bf replace &
\cltxt
This works well enough for printing characters. Non-printing
characters
\\ \bf with &
\cltxt
This works well enough for graphic characters. Non-graphic
characters
\editend
\subsubsection{Standard Characters} % 2.2.1.
\edithead {\csdag 0 section heading}
\editstart
\\ \bf replace &
\cltxt
Standard Characters
\\ \bf with &
\cltxt
Base Characters
\editend
\\
\edithead {\csdag 1 before}
\editstart
\\ \bf insert &
\cltxt
Most computers have some "base" character representation which
is a function
of hardware instructions for dealing with characters, as well as
the organization of
the file system. This base character representation is likely
to be the smallest
transaction unit permitted for text stream I/O operations.
The base character representation (often a byte) supports an
implementation specific
{\em coded base character set} such as the ASCII and the EBCDIC
coded character sets.
The {\em base character repertoire} is defined as
the collection of characters
contained in the coded base character set. Common LISP does
not define the base
character encoding
but does require all implementations to support a "standard"
{\em subrepertoire} of the base character
repertoire.
\editend
\\
\edithead {\csdag 1 before}
\editstart
\\ \bf insert &
\cltxt
The {\clkwd base-character} type is defined as a subtype of
{\clkwd character}. A {\clkwd base-character}
object can contain any member of the base character repertoire.
Objects of type
{\clkwd (and character (not base-character))} are referred to
as {\em extended characters}.
\editend
\\
\edithead {\csdag 1}
\editstart
\\ \bf delete &
\cltxt
Common LISP defines a "standard character set" ...
\editend
\\
\edithead {\csdag 1}
\editstart
\\ \bf new &
\cltxt
As a subset of the base character repertoire,
Common LISP defines a standard character
subrepertoire for two purposes.
Common LISP programs that are written in the
standard character subrepertoire
can be read by any Common LISP implementation; and Common LISP
programs
that use only standard characters as data objects are most likely
to be portable.
The standard characters are not defined by their glyphs, but by their
roles within
the language. There are two aspects to the roles of the
standard characters:
one is their role in reader and format control
string syntax; the second is their role as
components of the names of all Common LISP
functions, macros, constants, and global
variables. As long as an implementation chooses 96 glyphs
and treats those 96 in a manner
consistent with the language's specification for the standard characters
(for example,
the naming of functions),
it doesn't matter what glyphs the I/O
hardware uses to
represent those characters: they are
the standard characters. Any program or
data text written wholly
in those characters
is portable through simple code conversion.
The Common LISP
standard character subrepertoire
consists of a space character \#$\backslash${\clkwd Space}, a newline
\#$\backslash${\clkwd Newline}, and the
following ninety-four graphic characters or their equivalents:
\editend
\\
\edithead {\csdag 2}
\editstart
\\ \bf delete &
\cltxt
! " \# ...
\editend
\\
\edithead {\csdag 2 new}
\editstart
\\ &
{\bf Common LISP Standard Character Subrepertoire}
\editend
\footnote{\cltxt \#$\backslash${\clkwd Space}
and \#$\backslash${\clkwd Newline} are omitted.
Graphic identifiers and descriptions are from ISO 6937/2.}
\\
{\small \begin{tabular}{||l|c|l||l|c|l||} \hline
ID & Glyph & Name or description
& ID & Glyph & Name or description
\\ \hline
LA01 & a & small a
& ND01 & 1 & digit 1
\\ \hline
LA02 & A & capital A
& ND02 & 2 & digit 2
\\ \hline
LB01 & b & small b
& ND03 & 3 & digit 3
\\ \hline
LB02 & B & capital B
& ND04 & 4 & digit 4
\\ \hline
LC01 & c & small c
& ND05 & 5 & digit 5
\\ \hline
LC02 & C & capital C
& ND06 & 6 & digit 6
\\ \hline
LD01 & d & small d
& ND07 & 7 & digit 7
\\ \hline
LD02 & d & capital D
& ND08 & 8 & digit 8
\\ \hline
LE01 & e & small e
& ND09 & 9 & digit 9
\\ \hline
LE02 & E & capital E
& ND00 & 0 & digit 0
\\ \hline
LF01 & f & small f
& SC03 & \$ & dollar sign
\\ \hline
LF02 & F & capital F
& SP02 & ! & exclamation mark
\\ \hline
LG01 & g & small g
& SP04 & " & quotation mark
\\ \hline
LG02 & G & capital G
& SP05 & \apostrophe & apostrophe
\\ \hline
LH01 & h & small h
& SP06 & ( & left parenthesis
\\ \hline
LH02 & H & capital H
& SP07 & ) & right parenthesis
\\ \hline
LI01 & i & small i
& SP08 & , & comma
\\ \hline
LI02 & I & capital I
& SP09 & \_ & low line
\\ \hline
LJ01 & k & small j
& SP10 & - & hyphen or minus sign
\\ \hline
LJ02 & K & capital J
& SP11 & . & full stop, period
\\ \hline
LK01 & k & small k
& SP12 & / & solidus
\\ \hline
LK02 & K & capital K
& SP13 & : & colon
\\ \hline
LL01 & l & small l
& SP14 & ; & semicolon
\\ \hline
LL02 & L & capital L
& SP15 & ? & question mark
\\ \hline
LM01 & m & small m
& SA01 & + & plus sign
\\ \hline
LM02 & M & capital M
& SA03 & $<$ & less-than sign
\\ \hline
LN01 & n & small n
& SA04 & = & equals sign
\\ \hline
LN02 & N & capital N
& SA05 & $>$ & greater-than sign
\\ \hline
LO01 & o & small o
& SM01 & \# & number sign
\\ \hline
LO02 & O & capital O
& SM02 & \% & percent sign
\\ \hline
LP01 & p & small p
& SM03 & \& & ampersand
\\ \hline
LP02 & P & capital P
& SM04 & * & asterisk
\\ \hline
LQ01 & q & small q
& SM05 & @ & commercial at
\\ \hline
LQ02 & Q & capital Q
& SM06 & [ & left square bracket
\\ \hline
LR01 & r & small r
& SM07 & $\backslash$ & reverse solidus
\\ \hline
LR02 & R & capital R
& SM08 & ] & right square bracket
\\ \hline
LS01 & s & small s
& SM11 & \} & left curly bracket
\\ \hline
LS02 & S & capital S
& SM13 & $|$ & vertical bar
\\ \hline
LT01 & t & small t
& SM14 & \} & right curly bracket
\\ \hline
LT02 & T & capital T
& SD13 & \bq & grave accent
\\ \hline
LU01 & u & small u
& SD15 & $\hat{ }$ & circumflex accent
\\ \hline
LU02 & U & capital U
& SD19 & $\tilde{ }$ & tilde
\\ \hline
LV01 & v & small v
& & &
\\ \hline
LV22 & V & capital V
& & &
\\ \hline
LW01 & w & small w
& & &
\\ \hline
LW02 & W & capital W
& & &
\\ \hline
LX01 & x & small x
& & &
\\ \hline
LX22 & X & capital X
& & &
\\ \hline
LY01 & y & small y
& & &
\\ \hline
LY02 & Y & capital Y
& & &
\\ \hline
LZ01 & z & small z
& & &
\\ \hline
LZ02 & Z & capital Z
& & &
\\
\hline
\end{tabular} }
\\
\edithead {\csdag 3}
\editstart
\\ \bf delete &
\cltxt
@ A B C...
\editend
\\
\edithead {\csdag 4}
\editstart
\\ \bf delete &
\cltxt
\bq a b c...
\editend
\\
\edithead {\csdag 5}
\editstart
\\ \bf delete &
\cltxt
The Common LISP Standard character set is apparently ...
\editend
\\
\edithead {\csdag 6}
\editstart
\\ \bf replace &
\cltxt
Of the ninety-four non-blank printing characters
\\ \bf with &
\cltxt
Of the ninety-four graphic characters
\editend
\\
\edithead {\csdag 9}
\editstart
\\ \bf delete &
\cltxt
The following characters are called ...
\editend
\\
\edithead {\csdag 10}
\editstart
\\ \bf delete &
\cltxt
{\clkwd \#$\backslash$Backspace \#$\backslash$Tab } ...
\editend
\\
\edithead {\csdag 11}
\editstart
\\ \bf delete &
\cltxt
Not all implementations of Common ...
\editend
\subsubsection{Line Divisions} % 2.2.2.
\subsubsection{Non-standard Characters} % 2.2.3.
\edithead {\csdag delete entire section}
\editstart
\editend
\subsubsection{Character Attributes} % 2.2.4.
\edithead {\csdag 0 section heading}
\editstart
\\ \bf replace &
\cltxt
Character Attributes
\\ \bf with &
\cltxt
Character Identity
\editend
\\
\edithead {\csdag 1 through 8}
\editstart
\\ \bf delete all paragraphs&
\cltxt
Every object of type {\clkwd character} ...
\editend
\\
\edithead {\csdag 1}
\editstart
\\ \bf new &
\cltxt
A data object of type {\clkwd character} is identified by its
{\em character code}, a unique numerical code identification.
Each character code is composed from
a {\em character set identifier},
shared by all characters of a particular character
set, and a {\em character set index}, a numerical identification which
is unique within a particular character set.
\\ &
An implementation need
not support more than one character set, the {\em base} character set.
If it does support multiple
character sets, it must define the sets supported and
their characteristics. Character set identifiers are assigned to
character sets by the implementation.
The convention by which the character set index
and character set identifier are composed into a single integer code
is implementation dependent.
\\ &
Characters within the base character set are referred to as
{\em base characters}. Characters not in the base character set
are referred to as {\em extended characters}.
\\ &
\\ & \bf Compatibility note: -------------
\\ &
For compatibility with earlier versions of Common LISP incorporating
various attributes of character objects, see 13 for a
discussion of implementation-dependent attributes.
\\ & \bf --------------------------------------------
\editend
\subsubsection{String Characters} % 2.2.5.
\edithead {\csdag delete entire section}
\editstart
\editend
\subsection{Symbols} % 2.3.
\edithead {\csdag 12}
\editstart
\\ \bf replace &
\cltxt
A symbol may have uppercase letters, lowercase letters, or both
in its print name.
\\ \bf with &
\cltxt
A symbol may have characters from any supported character repertoire
in its print name.
It may have uppercase letters, lowercase letters, or both.
\editend
\setcounter{subsection}{4}
\subsection{Arrays}
\subsubsection{Vectors}
\edithead {\csdag 6}
\editstart
\\ \bf replace &
\cltxt
All implementations provide specialized arrays for the cases when
the components are characters (or rather, a special subset of the
characters);
\\ \bf with &
\cltxt
All implementations provide specialized arrays for the cases when
the components are characters (or optionally, special subsets of
the characters);
\editend
\subsubsection{Strings}
\edithead {\csdag 1}
\editstart
\\ \bf replace &
\cltxt
A string is simply a vector of characters. More precisely, a string
is a specialized vector whose elements are of type
{\clkwd string-char}.
\\ \bf with &
\cltxt
A string is simply a vector of characters. More precisely, a string
is a specialized vector whose elements are of type
{\clkwd character} or a subtype
of character.
\editend
\setcounter{subsection}{14}
\subsection{Overlap, Inclusion, and Disjointness of Types} % 2.15.
\edithead {\csdag 14}
\editstart
\\ \bf replace &
\cltxt
The type {\clkwd standard-char} is a subtype of {\clkwd string-char};
{\clkwd string-char} is a subtype of {\clkwd character}.
\\ \bf with &
\\ & \bf Compatibility note: -------------
\\ &
\cltxt
The type {\clkwd standard-char} is a subtype of
{\clkwd base-character};
The type {\clkwd string-char} means {\clkwd character}. Both
are retained for compatibility with earlier versions of Common LISP.
\\ & \bf --------------------------------------------
\editend
\\
\edithead {\csdag 15}
\editstart
\\ \bf replace &
\cltxt
The type {\clkwd string} is a subtype of {\clkwd vector},
for {\clkwd string} means {\clkwd (vector string-char)}.
\\ \bf with &
\cltxt
The type {\clkwd string} is a subtype of {\clkwd vector},
{\clkwd string} consists of vectors specialized by subtypes of
{\clkwd character}.
\editend
\\
\edithead {\csdag 15 after}
\editstart
\\ \bf insert &
\cltxt
The type {\clkwd base-string} means
{\clkwd (vector base-character)}.
\editend
\\
\edithead {\csdag 15 after}
\editstart
\\ \bf insert &
\cltxt
The type {\clkwd most-general-string} means
{\clkwd (vector character)} and is a subtype of {\clkwd string}.
\editend
\\
\edithead {\csdag 20}
\editstart
\\ \bf replace &
\cltxt
{\clkwd (simple-array string-char (*))};
\\ \bf with &
\cltxt
{\clkwd (simple-array character (*))};
\editend
\\
\edithead {\csdag 20 after}
\editstart
\\ \bf insert &
\cltxt
The type {\clkwd simple-base-string} means
{\clkwd (simple-array base-character (*))} and
is the most efficient string which can hold
the standard character repertoire.
\editend
%----------------------------------------------------------------------
\setcounter{section}{3}
\section{Type Specifiers} % 4
%----------------------------------------------------------------------
\setcounter{subsection}{1}
\subsection{Type Specifier Lists} % 4.2.
\edithead {\csdag 8 Table 4-1 (alphabetic list)}
\editstart
\\ \bf remove &
\\ &
\cltxt
{\clkwd standard-char}
\\ &
{\clkwd string-char}
\editend
\\
\edithead {\csdag 8 Table 4-1 (alphabetic list)}
\editstart
\\ \bf insert &
\\ &
\cltxt
{\clkwd base-character}
\\ &
{\clkwd most-general-string}
\\ &
{\clkwd simple-base-string}
\editend
\setcounter{subsection}{2}
\subsection{Predicating Type Specifiers} % 4.3.
\edithead {\csdag 2}
\editstart
\\ \bf delete &
\cltxt
As an example, the entire ...
\editend
\\
\edithead {\csdag 3 delete example}
\editstart
\\ \bf delete &
\cltxt
{\clkwd (deftype string-char () } ...
\editend
\setcounter{subsection}{5}
\subsection{Type Specifiers That Abbreviate} % 4.6.
\edithead {\csdag 20}
\editstart
\\ \bf replace &
\cltxt
Means the same as {\clkwd (array string-char ({\em size}))}: the set of
strings of
the indicated size.
\\ \bf with &
\cltxt
Means the union of the vector types specialized by subtypes of
character
and the indicated size.
\editend
\\
\edithead {\csdag 23}
\editstart
\\ \bf replace &
\cltxt
Means the same as {\clkwd (simple-array string-char ({\em size}))}: the
set of simple strings of the indicated size.
\\ \bf with &
\cltxt
Means the same as {\clkwd (simple-array character ({\em size}))}: the
set of simple strings of the indicated size.
\editend
\\
\edithead {\csdag 23 after}
\editstart
\\ \bf insert &
\cltxt
{\clkwd (base-string {\em size})}
\\ &
Means the same as {\clkwd (array base-character ({\em size}))}: the
set of base strings of the indicated size.
\\ &
{\clkwd (simple-base-string {\em size})}
\\ &
Means the same as {\clkwd (simple-array base-character ({\em size}))}:
the set of simple base strings of the indicated size.
\editend
\setcounter{subsection}{7}
\subsection{Type Conversion Function} % 4.8.
\edithead {\csdag 6}
\editstart
\\ \bf replace &
\cltxt
then the sole element of the print name is returned.
If {\em object} is an integer {\em n}, then {\clkwd (int-char }
{\em n}{\clkwd )} is returned. See {\clkwd character}.
\\ \bf with &
\cltxt
then the sole element of the print name is returned.
If {\em object} is an integer {\em n}, then {\clkwd (code-char }
{\em n}{\clkwd )} is returned. See {\clkwd character}.
\editend
\\
\edithead {\csdag 6 after}
\editstart
\\ \bf insert &
\begin{itemize}
\cltxt
\item Any string subtype may be converted to any other string
subtype, provided the new string can contain all actual
elements or the old string. It is an error if it cannot.
\end{itemize}
\editend
%----------------------------------------------------------------------
\setcounter{section}{5}
\section{Predicates} % 6
%----------------------------------------------------------------------
\edithead {\csdag 2}
\editstart
\\ \bf replace &
\cltxt
but {\clkwd standard-char} begets {\clkwd standard-char-p}
\\ \bf with &
\cltxt
but {\clkwd bit-vector} begets {\clkwd bit-vector-p}
\editend
\setcounter{subsection}{1}
\subsection{Data Type Predicates} % 6.2.
\setcounter{subsubsection}{1}
\subsubsection{Specific Data Type Predicates} % 6.2.2.
\edithead {\csdag 36}
\editstart
\\ \bf replace &
\cltxt
{\clkwd characterp} {\em object}
\\ \bf with &
\cltxt
{\clkwd characterp} {\em object} \&{\clkwd optional}
({\em repertoire})
\editend
\\
\edithead {\csdag 37}
\editstart
\\ \bf replace &
\cltxt
{\clkwd characterp} is true if its argument is a character,
and otherwise is false.
\\ \bf with &
\cltxt
If {\em repertoire} is omitted, {\clkwd characterp}
is true if its argument is a character object,
and otherwise is false.
If a {\em repertoire} keyword argument is specified,
{\clkwd characterp} is true if its argument
is a character object and a member of the specified repertoire
or subrepertoire, and
otherwise is false.
For example, {\clkwd (characterp \#$\backslash$A}
{\clkwd :standard)}
is true since \#$\backslash$A is a member of the standard character
subrepertoire.
\editend
\\
\edithead {\csdag 38}
\editstart
\\ \bf replace &
\cltxt
{\clkwd (characterp x) $\equiv$ (typep x \apostrophe character)}
\\ \bf with &
\cltxt
{\clkwd (characterp x :standard) $\equiv$ (typep x \apostrophe
(character :standard)}
\editend
\\
\edithead {\csdag 72}
\editstart
\\ \bf replace &
\cltxt
See also {\clkwd standard-char-p, string-char-p, streamp,}
\\ \bf with &
\cltxt
See also {\clkwd standard-char-p, streamp,}
\editend
\setcounter{subsubsection}{2}
\subsubsection{Equality Predicates} % 6.2.3.
\edithead {\csdag 75}
\editstart
\\ \bf replace &
\cltxt
which ignores alphabetic case and certain other attributes
of characters;
\\ \bf with &
\cltxt
which ignores alphabetic case
of characters;
\editend
%----------------------------------------------------------------------
\setcounter{section}{6}
\section{Control Structure} % 7
%----------------------------------------------------------------------
\setcounter{subsection}{1}
\subsection{Generalized Variables} % 7.2.
\edithead {\csdag 19 modify table}
\editstart
\\ \bf replace &
\cltxt
char string-char
\\ &
schar string-char
\\ \bf with &
\cltxt
char character
\\ &
schar character
\editend
\\
\edithead {\csdag 22 table entry}
\editstart
\\ \bf delete &
\cltxt
char-bit first set-char-bit
\editend
%----------------------------------------------------------------------
\setcounter{section}{9}
\section{Symbols} % 10
%----------------------------------------------------------------------
\edithead {\csdag 3}
\editstart
\\ \bf replace &
\cltxt
It is ordinarily not permitted to alter a symbol's print name.
\\ \bf with &
\cltxt
It is an error to alter a symbol's print name.
\editend
\setcounter{subsection}{1}
\subsection{The Print Name} % 10.2.
\edithead {\csdag 5}
\editstart
\\ \bf replace &
\cltxt
It is an extremely bad idea
\\ \bf with &
\cltxt
It is an error and an extremely bad idea
\editend
%----------------------------------------------------------------------
\setcounter{section}{12}
\section{Characters} % 13
%----------------------------------------------------------------------
\edithead {\csdag 6 after}
\editstart
\\ \bf insert &
\cltxt
{\clkwd char-code-limit} [{\clkwd Constant}]
\\ &
The value of {\clkwd char-code-limit} is a non-negative integer
that is the upper exclusive bound on values produced by the
function {\clkwd char-code}, which returns the {\em code}
of a given character; that is, the values returned by
{\clkwd char-code} are non-negative and strickly less than
the value of {\clkwd char-code-limit}.
There may be unassigned codes between 0 and
{\clkwd char-code-limit} which
are not legal arguments to {\clkwd code-char}.
\\ & \bf Compatibility note: -------------
\\ &
Earlier versions of Common LISP incorporated {\em font} and
{\em bits} as attributes of character objects. These are considered
implementation-defined
attributes of character objects and if supported by an implementation
effect the action of selected functions:
\begin{itemize}
\item Attributes, such as those
dealing with how the character is displayed or its typography,
are not part of the character code.
For example, bold-face, color
or size are not considered part of the character code.
\item If two characters differ in any implementation-defined attributes,
then they are not {\clkwd char=}.
\item If two characters have identical implementation-defined
attributes, then their ordering by
{\clkwd char}$<$ is consistent with the numerical ordering by the
predicate $<$ on
their code attributes. (Similarly for {\clkwd char}$>$,
{\clkwd char}$>=$ and {\clkwd char}$<=$.)
\item {\clkwd char-equal} ignores implementation-defined attributes.
\item The effect of {\clkwd char-upcase} and {\clkwd char-downcase}
is to preserve implemenation-defined attributes.
\item The function {\clkwd char-int} is equivalent to {\clkwd char-code}
if no implementation-defined attributes are associated with
the character object.
\item The function {\clkwd int-char} is equivalent to {\clkwd code-char}
if no implementation-defined attributes are associated with
the character object.
\item It is implementation dependent whether characters within
double quotes have implementation-defined attributes removed.
\item In symbol construction, implementation-defined attributes such as
color are removed.
\end{itemize}
\\ & \bf --------------------------------------------
\editend
\setcounter{subsection}{0}
\subsection{Character Attributes} % 13.1.
\edithead {\csdag delete entire section}
\editstart
\editend
\setcounter{subsection}{1}
\subsection{Predicates on Characters} % 13.2.
\edithead {\csdag 3}
\editstart
\\ \bf replace &
\cltxt
argument is a "standard character" that is, an object of type
{\clkwd standard-char}.
Note that any character with a non-zero {\em bits} or {\em font}
attribute
is non-standard.
\\ \bf with &
\cltxt
argument is one of the Common LISP standard character subrepertoire.
\editend
\\
\edithead {\csdag 4}
\editstart
\\ \bf delete &
\cltxt
Note that any character with non-zero ...
\editend
\\
\edithead {\csdag 6}
\editstart
\\ \bf replace &
\cltxt
Of the standard characters all but \#$\backslash${\clkwd Newline}
are graphic.
The semi-standard characters \#$\backslash${\clkwd Backspace},
\#$\backslash${\clkwd Tab},
\#$\backslash${\clkwd Rubout},
\#$\backslash${\clkwd Linefeed},
\#$\backslash${\clkwd Return},
and \#$\backslash${\clkwd Page} are not graphic.
\\ \bf with &
\cltxt
Of the standard characters all but \#$\backslash${\clkwd Newline}
are graphic.
\editend
\\
\edithead {\csdag 7}
\editstart
\\ \bf delete &
\cltxt
Programs may assume that graphic ...
\editend
\\
\edithead {\csdag 8}
\editstart
\\ \bf delete &
\cltxt
Any character with a non-zero bits...
\editend
\\
\edithead {\csdag 9}
\editstart
\\ \bf delete &
\cltxt
{\clkwd string-char-p} ...
\editend
\\
\edithead {\csdag 10}
\editstart
\\ \bf delete &
\cltxt
The argument {\em char} must be ...
\editend
\\
\edithead {\csdag 13}
\editstart
\\ \bf replace &
\cltxt
If a character is alphabetic, then it is perforce graphic. Therefore
any character
with a non-zero bits attribute cannot be alphabetic. Whether a
character is
alphabetic is may depend on its font number.
\\ \bf with &
\cltxt
If a character is alphabetic, then it is perforce graphic.
\editend
\\
\edithead {\csdag 22}
\editstart
\\ \bf replace &
\cltxt
If a character is either uppercase or lowercase, it is necessarily
alphabetic (and
therefore is graphic, and therefore has a zero bits attribute).
However, it is permissible in theory for an alphabetic character
to be neither
uppercase nor lowercase (in a non-Roman font, for example).
\\ \bf with &
\cltxt
If a character is either uppercase or lowercase, it is necessarily
alphabetic (and
therefore is graphic).
\editend
\\
\edithead {\csdag 25}
\editstart
\\ \bf replace &
\cltxt
The argument {\em char} must be a character object, and {\em radix}
must be a non-negative
integer. If {\em char} is not a digit of the radix specified
\\ \bf with &
\cltxt
The argument {\em char} must be in the standard character
subrepertoire and
{\em radix} must be a non-negative integer.
If {\em char} is not a standard character or is not a digit of the
radix specified
\editend
\\
\edithead {\csdag 51}
\editstart
\\ \bf delete &
\cltxt
If two characters have the same bits ...
\editend
\\
\edithead {\csdag 52}
\editstart
\\ \bf replace &
\cltxt
If two characters differ in any attribute (code, bits, or font), then
they are different.
\\ \bf with &
\cltxt
If the codes of two characters differ, then
they are different.
\\ & \bf Compatibility note: -------------
\\ &
If two characters differ in any implementation-defined attributes,
then they are different.
\\ & \bf --------------------------------------------
\editend
\\
\edithead {\csdag 94}
\editstart
\\ \bf replace &
\cltxt
The predicate {\clkwd char-equal} is like {\clkwd char=}, and
similarly for the others, except
according to a different ordering such that differences of bits
attributes and case are ignored, and font information is taken into
account in an implementation dependent manner.
\\ \bf with &
\cltxt
The predicate {\clkwd char-equal} is like {\clkwd char=}, and
similarly for the others, except
according to a different ordering such that differences of case and
implementation-defined attributes are ignored.
\editend
\\
\edithead {\csdag 97 example}
\editstart
\\ \bf delete &
\cltxt
{\clkwd (char-equal \#$\backslash$A \#$\backslash$Control-A) is true}
\editend
\\
\edithead {\csdag 98}
\editstart
\\ \bf delete &
\cltxt
The ordering may depend on the font ...
\editend
\setcounter{subsection}{2}
\subsection{Character Construction and Selection} % 13.3.
\edithead {\csdag 3}
\editstart
\\ \bf replace &
\cltxt
The argument {\em char} must be a character object.
{\clkwd char-code} returns the {\em code} attribute of the
character object;
this will be a non-negative integer less than the (normal) value
\\ \bf with &
\cltxt
The argument {\em char} must be a character object.
{\clkwd char-code} returns the {\em code} of the
character object;
this will be a non-negative integer less than the value
\editend
\\
\edithead {\csdag 4}
\editstart
\\ \bf delete &
\cltxt
{\clkwd char-bits } ...
\editend
\\
\edithead {\csdag 5}
\editstart
\\ \bf delete &
\cltxt
The argument {\em char} must be ...
\editend
\\
\edithead {\csdag 6}
\editstart
\\ \bf delete &
\cltxt
{\clkwd char-font } ...
\editend
\\
\edithead {\csdag 7}
\editstart
\\ \bf delete &
\cltxt
The argument {\em char} must be ...
\editend
\\
\edithead {\csdag 8}
\editstart
\\ \bf replace &
\cltxt
{\clkwd code-char {\em code} \&optional {\em (bits 0) (font 0)}
[{\em Function}]}
\\ \bf with &
\cltxt
{\clkwd code-char {\em code}
[{\em Function}]}
\editend
\\
\edithead {\csdag 9}
\editstart
\\ \bf replace &
\cltxt
All three arguments must be non-negative integers. If it is possible
in the
implementation to construct a character object whose code attribute
is {\em code},
whose
bits attribute is {\em bits}, and whose font attribute is {\em font},
then such an object
is returned;
\\ \bf with &
\cltxt
The argument must be a non-negative integer. If it is possible
in the
implementation to construct a character object identified by
{\em code},
then such an object is returned;
\editend
\\
\edithead {\csdag 10}
\editstart
\\ \bf replace &
\cltxt
For any integers, {\em c, b,} and {\em f}, if {\clkwd (code-char
{\em c b f})} is
\\ \bf with &
\cltxt
For any integer, {\em c}, if {\clkwd (code-char
{\em c})} is
\editend
\\
\edithead {\csdag 12}
\editstart
\\ \bf delete &
\cltxt
{\clkwd (char-bits (code-char } ...
\editend
\\
\edithead {\csdag 13}
\editstart
\\ \bf delete &
\cltxt
{\clkwd (char-font (code-char } ...
\editend
\\
\edithead {\csdag 14}
\editstart
\\ \bf delete &
\cltxt
If the font and bits attributes ...
\editend
\\
\edithead {\csdag 15}
\editstart
\\ \bf delete &
\cltxt
{\clkwd (char= (code-char (char-code ...}
\editend
\\
\edithead {\csdag 16}
\editstart
\\ \bf delete &
\cltxt
is true.
\editend
\\
\edithead {\csdag 17}
\editstart
\\ \bf delete &
\cltxt
{\clkwd make-char} ...
\editend
\\
\edithead {\csdag 18}
\editstart
\\ \bf delete &
\cltxt
The argument {\em char} must be ...
\editend
\\
\edithead {\csdag 19}
\editstart
\\ \bf delete &
\cltxt
If {\em bits} or {\em font} are zero ...
\editend
\setcounter{subsection}{3}
\subsection{Character Conversions} % 13.4.
\edithead {\csdag 8}
\editstart
\\ \bf replace &
\cltxt
{\clkwd char-upcase} returns a character object with the same
font and bits attributes as {\em char}, but with possibly a
different code attribute.
\\ \bf with &
\cltxt
{\clkwd char-upcase} returns a character object with possibly
a different code.
\editend
\\
\edithead {\csdag 10}
\editstart
\\ \bf replace &
\cltxt
Similarly, {\clkwd char-downcase} returns a character object with the
same font and bits attributes as {\em char}, but with possibly a
different code attribute.
\\ \bf with &
\cltxt
Similarly, {\clkwd char-downcase} returns a character object with
possibly a different code.
\editend
\\
\edithead {\csdag 12}
\editstart
\\ \bf delete &
\cltxt
Note that the action of ...
\editend
\\
\edithead {\csdag 13}
\editstart
\\ \bf replace &
\cltxt
{\clkwd digit-char {\em weight} \&optional ({\em radix} 10)
({\em font} 0) [{\em Function}]}
\\ \bf with &
\cltxt
{\clkwd digit-char {\em weight} \&optional ({\em radix} 10)
[{\em Function}]}
\editend
\\
\edithead {\csdag 14}
\editstart
\\ \bf replace &
\cltxt
All arguments must be integers. {\clkwd digit-char} determines
whether or not it is
possible
to construct a character object whose font attribute is {\em font},
and whose {\em code}
\\ \bf with &
\cltxt
All arguments must be integers. {\clkwd digit-char} determines
whether or not it is
possible to construct a character object whose {\em code}
\editend
\\
\edithead {\csdag 15}
\editstart
\\ \bf replace &
\cltxt
{\clkwd digit-char} cannot return {\clkwd nil} if {\em font}
is zero, {\em radix}
\\ \bf with &
\cltxt
{\clkwd digit-char} cannot return {\clkwd nil}.
{\em radix}
\editend
\\
\edithead {\csdag 22}
\editstart
\\ \bf delete &
\cltxt
Note that no argument is provided for ...
\editend
\\
\edithead {\csdag 22 after}
\editstart
\\ \bf insert &
\\ & \bf Compatibility note: -------------
\\ &
The {\clkwd char-int} and {\clkwd int-char} functions are retained
for compatibility with earlier verions of Common LISP which support
implementation-defined attributes.
\editend
\\
\edithead {\csdag 24}
\editstart
\\ \bf replace &
\cltxt
The argument {\em char} must be a character object. {\clkwd char-int}
returns a non-negative integer encoding the character object.
\\ \bf with &
\cltxt
The argument {\em char} must be a character object. {\clkwd char-int}
returns a non-negative integer encoding the character object
including any implementation-defined attributes.
\editend
\\
\edithead {\csdag 25}
\editstart
\\ \bf replace &
\cltxt
If the font and bits attributes of {\em char} are zero, then
\\ \bf with &
\cltxt
If the implementation-defined attributes of {\em char} are zero, then
\editend
\\
\edithead {\csdag 30 after}
\editstart
\\ \bf insert &
\\ & \bf --------------------------------------------
\editend
\\
\edithead {\csdag 32}
\editstart
\\ \bf replace &
\cltxt
All characters that have zero font and bits attributes and that are
non-graphic
\\ \bf with &
\cltxt
All characters that are
non-graphic
\editend
\\
\edithead {\csdag 33}
\editstart
\\ \bf replace &
\cltxt
The standard newline and space characters have the respective
names {\clkwd Newline} and {\clkwd Space}. The semi-standard
characters have the names {\clkwd Tab, Page, Rubout, Linefeed,
Return,} and {\clkwd Backspace}.
\\ \bf with &
\cltxt
The standard newline and space characters have the respective
names {\clkwd Newline} and {\clkwd Space}.
\editend
\\
\edithead {\csdag 35}
\editstart
\\ \bf delete &
\cltxt
{\clkwd char-name} will only locate "simple" ...
\editend
\setcounter{subsection}{4}
\subsection{Character Control-Bit Functions} % 13.5.
\edithead {\csdag delete entire section}
\editstart
\editend
%----------------------------------------------------------------------
\setcounter{section}{13}
\section{Sequences} % 14
%----------------------------------------------------------------------
\setcounter{subsection}{0}
\subsection{Simple Sequence Functions} % 14.1
\edithead {\csdag 24}
\editstart
\\ \bf append &
\cltxt
If type {\clkwd string} is specified, a string of type
{\clkwd most-general-string} is returned.
\editend
\setcounter{subsection}{1}
\subsection{Concatenating, Mapping, and Reducing Sequences} % 14.2.
\edithead {\csdag 3}
\editstart
\\ \bf append &
\cltxt
If {\em result-type} {\clkwd string} is specified, any string
subtype which can hold the elements of the sequence can be returned.
\editend
\\
\edithead {\csdag 6}
\editstart
\\ \bf append &
\cltxt
If {\em result-type} {\clkwd string} is specified, any string
subtype which can hold the elements of the sequence can be returned.
\editend
\setcounter{subsection}{2}
\subsection{Modifying Sequences} % 14.3.
\edithead {\csdag 29}
\editstart
\\ \bf append &
\cltxt
If {\em newitem} is of type {\clkwd string}, any string subtype
which can hold the elements of the result sequence can be returned.
\editend
\\
\edithead {\csdag 36}
\editstart
\\ \bf append &
\cltxt
If {\em newitem} is of type {\clkwd string}, any string subtype
which can hold the elements of the result sequence can be returned.
\editend
\setcounter{subsection}{4}
\subsection{Sorting and Merging} % 14.5.
\edithead {\csdag 20}
\editstart
\\ \bf append &
\cltxt
If {\em result-type} {\clkwd string} is specified, any string subtype
which can hold the elements of the result sequence can be returned.
\editend
%----------------------------------------------------------------------
\setcounter{section}{17}
\section{Strings} % 18
%----------------------------------------------------------------------
\edithead {\csdag 1}
\editstart
\\ \bf replace &
\cltxt
Specifically, the type {\clkwd string} is identical to the type
{\clkwd (vector string-char),}
which in turn is the same as {\clkwd (array string-char (*))}.
\\ \bf with &
\cltxt
Specifically, the type {\clkwd string} is a subtype of
{\clkwd vector}
and consists of vectors specialized by subtypes of {\clkwd character}.
\editend
\setcounter{subsection}{0}
\subsection{String Access} % 18.1.
\edithead {\csdag 3}
\editstart
\\ \bf replace &
\cltxt
{\clkwd schar} {\em simple-string index} [{\em Function}]
\\ \bf with &
\cltxt
{\clkwd schar} {\em simple-base-string index} [{\em Function}]
\editend
\\
\edithead {\csdag 4}
\editstart
\\ \bf replace &
\cltxt
character object. (This character will necessarily satisfy the
predicate
{\clkwd string-char-p}).
\\ \bf with &
\cltxt
character object.
\editend
\\
\edithead {\csdag 9}
\editstart
\\ \bf replace &
\cltxt
{\clkwd setf} may be used with {\clkwd char} to destructively
replace a character within a string.
\\ \bf with &
\cltxt
{\clkwd setf} may be used with {\clkwd char} to destructively
replace a character within a string.
The new character must be of a type which can be stored in the
string; it is an error otherwise.
\editend
\\
\edithead {\csdag 10}
\editstart
\\ \bf replace &
\cltxt
it must be a simple string.
\\ \bf with &
\cltxt
it must be a simple base string.
\editend
\setcounter{subsection}{2}
\subsection{String Construction and Manipulation} % 18.3.
\edithead {\csdag 2}
\editstart
\\ \bf replace &
\cltxt
{\clkwd make-string {\em size} \&key :initial-element [{\em Function}]}
\\ \bf with &
\cltxt
{\clkwd make-string {\em size} \&key :initial-element :element-type
[{\em Function}]}
\editend
\\
\edithead {\csdag 3}
\editstart
\\ \bf replace &
\cltxt
This returns a string (in fact a simple string) of length {\em size},
each of whose characters has been initialized to the
{\clkwd :initial-element} argument. If an {\clkwd :initial-element}
argument is not specified, then the string will be initialized
in an implementation-dependent way.
\\ \bf with &
\cltxt
This returns a string of length {\em size},
each of whose characters has been initialized to the
{\clkwd :initial-element} argument. If an {\clkwd :initial-element}
argument is not specified, then the string will be initialized
in an implementation-dependent way.
The {\clkwd :element-type} argument names the type of the elements
of the string; a string is constructed of the most specialized
type that can accomodate elements of the given type.
\editend
\\
\edithead {\csdag 5}
\editstart
\\ \bf replace &
\cltxt
A string is really just a one-dimensional array of "string
characters" (that is,
those characters that are members of type {\clkwd string-char}).
More complex character arrays may be constructed using the function
{\clkwd make-array}.
\\ \bf with &
\cltxt
More complex character arrays may be constructed using the function
{\clkwd make-array}.
\editend
\\
\edithead {\csdag 29}
\editstart
\\ \bf replace &
\cltxt
If {\em x} is a string character (a character of type
{\clkwd string-char}), then
\\ \bf with &
\cltxt
If {\em x} is a character, then
\editend
%----------------------------------------------------------------------
\setcounter{section}{21}
\section{Input/Output} % 22
\setcounter{subsection}{0}
\subsection{Printed Representation of LISP Objects} % 22.1.
\setcounter{subsubsection}{0}
\subsubsection{What the Read Function Accepts} % 22.1.1.
\edithead {\csdag Table 22-1: Standard Character Syntax Types}
\editstart
\\ \bf delete entry &
\cltxt
{\clkwd <tab>} {\em whitespace}
\\ &
{\clkwd <page>} {\em whitespace}
\\ &
{\clkwd <backspace>} {\em constituent}
\\ &
{\clkwd <return>} {\em whitespace}
\\ &
{\clkwd <rubout>} {\em constituent}
\\ &
{\clkwd <linefeed>} {\em whitespace}
\editend
\setcounter{subsubsection}{1}
\subsubsection{Parsing of Numbers and Symbols} % 22.1.2.
\edithead {\csdag Table 22-3: Standard Constituent Character
Attributes}
\editstart
\\ \bf delete entry &
\cltxt
{\clkwd <backspace>} {\em illegal}
\\ &
{\clkwd <tab>} {\em illegal}
\\ &
{\clkwd <linefeed>} {\em illegal}
\\ &
{\clkwd <page>} {\em illegal}
\\ &
{\clkwd <return>} {\em illegal}
\\ &
{\clkwd <rubout>} {\em illegal}
\editend
\setcounter{subsubsection}{3}
\subsubsection{Standard Dispatching Macro Character Syntax} % 22.1.4.
\edithead {\csdag Table 22-4: Standard \# Macro Character Syntax}
\editstart
\\ \bf delete entry &
\cltxt
{\clkwd \#<backspace>} {\em signals error}
\\ &
{\clkwd \#<tab>} {\em signals error}
\\ &
{\clkwd \#<linefeed>} {\em signals error}
\\ &
{\clkwd \#<page>} {\em signals error}
\\ &
{\clkwd \#<return>} {\em signals error}
\\ &
{\clkwd \#<rubout>} {\em undefined}
\editend
\\
\edithead {\csdag 8}
\editstart
\\ \bf replace &
\cltxt
The following names are standard across all implementations:
\\ \bf with &
\cltxt
All characters, including extended characters, are uniquely
named in an implementation-dependent manner.
The following names are standard across all implementations:
\editend
\\
\edithead {\csdag 11 through 18 inclusive delete}
\editstart
\\ \bf delete &
\cltxt
The following names are semi-standard; ...
\editend
\\
\edithead {\csdag 20 through 26 inclusive delete}
\editstart
\\ \bf delete &
\cltxt
The following convention is used in implementations ...
\editend
\\
\edithead {\csdag 108}
\editstart
\\ \bf replace &
\cltxt
{\clkwd \#<space>, \#<tab>, \#<newline>, \#<page>, \#<return>}
\\ \bf with &
\cltxt
{\clkwd \#<space>, \#<newline>}
\editend
\setcounter{subsubsection}{4}
\subsubsection{The Readtable} % 22.1.5.
\edithead {\csdag 3}
\editstart
\\ \bf replace &
\cltxt
Even if an implementation supports characters with non-zero
{\em bits} and {\em font}
attributes, it need not (but may) allow for such characters to
have syntax
descriptions
in the readtable. However, every character of type
{\clkwd string-char}
must be represented in the readtable.
\\ \bf with &
\cltxt
Even if an implementation supports extended characters, it
need not
(but may) allow for such characters to
have syntax descriptions
in the readtable. However, every character of type
{\clkwd base-character} must be
represented in the readtable.
\editend
\setcounter{subsubsection}{5}
\subsubsection{What the Print Function Produces} % 22.1.6.
\edithead {\csdag 13}
\editstart
\\ \bf replace &
\cltxt
is used. For example, the printed representation of the character
\#$\backslash$A
with control
and meta bits on would be \#$\backslash${\clkwd CONTROL-META-A},
and that of
\#$\backslash$a with control and meta bits on would be
\#$\backslash${\clkwd CONTROL-META-$\backslash$a}.
\\ \bf with &
\cltxt
is used (see 22.1.4).
\editend
\setcounter{subsection}{2}
\subsection{Output Functions} % 22.3.
\setcounter{subsubsection}{0}
\subsubsection{Output to Character Streams} % 22.3.1.
\edithead {\csdag 26}
\editstart
\\ \bf replace &
\cltxt
({\em not} the substring delimited by {\clkwd :start} and
{\clkwd :end}).
\\ \bf with &
({\em not} the substring delimited by {\clkwd :start} and
{\clkwd :end}).
Only characters which are members of the character set(s)
associated with the output stream are valid to be written;
it is an error otherwise.
\editend
\\
\edithead {\csdag 27 after}
\editstart
\\ \bf insert &
\cltxt
{\clkwd external-width} {\em object} \&{\clkwd optional}
{\em output-stream} [{\em Function}]
\\ &
{\clkwd external-width} returns the number of host system base
character units required for the object on the output-stream. If
not applicable to the output stream, the function
to the output
should return {\clkwd nil}.
\editend
\\
\edithead {\csdag append to section}
\editstart
\\ \bf insert &
\cltxt
{\clkwd *format-external-width} [{\em Variable}]
\\ &
{\clkwd *format-external-width*} specifies how numeric parameters
in a format control string are interpreted.
This allows the directive parameters
to be used in producing columnar output, as long as the width
in bytes of the external code format corresponds to the
resulting width of the displayed or printed output.
\\ &
If {\clkwd *format-external-width*} is {\clkwd T} then
{\clkwd format} uses the destination stream type to interpret
numeric parameters as external format units for this type of
stream; if the destination stream type is {\clkwd NIL}, numeric
parameters are interpreted as characters. This is the default.
If {\clkwd NIL}, {\clkwd format} interprets numeric parameters as
characters, regardless of the destination stream type.
If the value is a keyword that specifies an external code
format recognized by the implementation (see {\clkwd open})
{\clkwd format} interprets numeric parameters as external format
units when the destination stream is {\clkwd NIL}. If the
destination stream type is not {\clkwd NIL}, this value has
no effect.
\editend
\setcounter{subsubsection}{2}
\subsubsection{Formatted Output to Character Streams} % 22.3.3.
\edithead {\csdag 23 delete example}
\editstart
\\ \bf delete &
\cltxt
{\clkwd (format nil "Type} $\tilde{ }$
{\clkwd :C to $\tilde{ }$ :A."} . . .
\editend
\\
\edithead {\csdag 66}
\editstart
\\ \bf replace &
\cltxt
$\tilde{ }${\clkwd :C} spells out the names of the control bits and
represents non-printing
characters by their names: {\clkwd Control-Meta-F, Control-Return,
Space}.
This is a "pretty" format for printing characters.
\\ \bf with &
\cltxt
$\tilde{ }${\clkwd :C}
represents non-printing
characters by their names: {\clkwd Newline,
Space}. This is a "pretty" format
for printing characters.
\editend
%----------------------------------------------------------------------
%----------------------------------------------------------------------
\setcounter{section}{22}
\section{File System Interface} % 23
\setcounter{subsection}{1}
\subsection{Opening and Closing Files} % 23.2.
\edithead {\csdag 2}
\editstart
\\ \bf replace &
\cltxt
{\clkwd open {\em filename} \&key :direction :element-type}
{\clkwd :if-exists :if-does-not-exist}
[{\em Function}]
\\ \bf with &
\cltxt
{\clkwd open {\em filename} \&key :direction :element-type}
{\clkwd :character-set
:external-code-format}
{\clkwd :if-exists :if-does-not-exist}
[{\em Function}]
\editend
\\
\edithead {\csdag 11}
\editstart
\\ \bf replace &
\cltxt
{\clkwd string-char}
\\ &
The unit of transaction is a string-character. The functions
{\clkwd read-char}
and/or {\clkwd write-char} may be used on the stream. This is
the default.
\\ \bf with &
\cltxt
{\clkwd base-character}
\\ &
The unit of transaction is a base character. The functions
{\clkwd read-char}
and/or {\clkwd write-char} may be used on the stream. This is
the default.
\editend
\\
\edithead {\csdag 16}
\editstart
\\ \bf replace &
\cltxt
{\clkwd character}
\\ &
The unit of transaction is any character, not just a string-character.
The functions
\\ \bf with &
\cltxt
{\clkwd character}
\\ &
The unit of transaction is any character.
The functions
\editend
\\
\edithead {\csdag 19 after}
\editstart
\\ \bf insert &
\cltxt
{\clkwd :external-code-format}
\\ &
This argument specifies
keyword(s) indicating an implementation recognized scheme for
representing 1 or more character sets with non-homogeneous codes.
\\ &
The default is the natural system character representation,
the base character representation.
\\ &
For example, the SO/SI SBCS/DBCS convention used by IBM on 370
machines could be selected by a keyword
{\clkwd :shift-delimited}.
The compact run-encoding convention defined by XEROX could be
selected by {\clkwd :run-encoded}.
The SBCS/DBCS convention based on
ASCII which uses leading bit patterns to distinguish two-byte codes
from one-byte codes could be selected by a keyword like
{\clkwd :high-byte-delimited}.
\\ &
As many {\clkwd :character-set} names must be provided as the
implementation requires for that external coding convention.
For example, if {\clkwd :shift-delimited} were the
{\clkwd :external-code-format} argument, two character set specifiers
would have to be provided.
\\ &
\editend
\\
\edithead {\csdag 19 after}
\editstart
\\ \bf insert &
\cltxt
{\clkwd :character-set}
\\ &
This argument specifies a implementation-defined
name or list of names of
defined character sets in the form of keywords.
The default is the base character set when
{\clkwd :external-code-format} is also defaulted. If a non-default
value is specified for {\clkwd :external-code-format}, there may be a
different default for {\clkwd :character-set}.
\editend
%----------------------------------------------------------------------
%----------------------------------------------------------------------
\begin{thebibliography}{wwwwwwww 99}
\bibitem[Ida87]{ida87} M. Ida, et al.,
{\em
JEIDA Common LISP Committee Proposal on Embedding Multi-Byte Characters
},
ANSI X3J13 document 87-022, (1987).
\bibitem[Linden87]{linden87} T. Linden,
{\em
Common LISP - Proposed Extensions for International Character Set
Handling
},
Version 01.11.87, IBM Corporation (1987).
\bibitem[Kerns87]{kerns87} R. Kerns,
{\em
Extended Characters in Common LISP
},
X3J13 Character Subcommittee document, Symbolics Inc (1987).
\bibitem[Steele84]{steele84} G. Steele Jr.,
{\em
Common LISP: the Language
},
Digital Press (1984).
\end{thebibliography}
\end{document} % End of document.
∂26-Sep-88 1032 CL-Characters-mailer relay of Ito message
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 26 Sep 88 10:31:40 PDT
Date: Mon, 26 Sep 88 09:05:16 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880926.090516.baggins@IBM.com>
Subject: relay of Ito message
======================================================================
Date: Mon, 26 Sep 88 14:31:59 jst
From: Takayasu ITO <ito%ito.ito.ecei.tohoku.junet%utokyo-relay.csnet@RELAY.CS.NET>
Return-Path: <ito@ito.ito.ecei.tohoku.junet>
Message-Id: <8809260531.AA00524@ito.ito.ecei.tohoku.junet>
To: baggins%ibm.com%relay.cs.net%u-tokyo.junet%utokyo-relay.csnet@RELAY.CS.NET
Status: R
Dear Dr. Linden,
I received your express airmail which contains DRAFT on Int'l Character Sets
for X3J13 October meeting.
I read DRAFT DRAFT which was given to me from Mr. Kurokawa of IBM Japan.
If DRAFT is essentially same with DRAFT DRAFT,we have many comments on it.
Since we are going to have our 3rd Special Meeting on Character Sets on
October 3rd we will let you know about our opinions on your proposal and
on our proposal to ISO WG16, before X3J13 October meeting.
On October 14 and 15 we are going to have a small meeting to prepare our
documents to ISO WG16. When we are ready to distribute it we will sent its
copy to you.
Thanking your physical mail.
Sincerely,
Takayasu Ito
∂28-Sep-88 1236 CL-Characters-mailer comments on character proposal
Received: from cs.utah.edu by SAIL.Stanford.EDU with TCP; 28 Sep 88 12:36:26 PDT
Received: by cs.utah.edu (5.54/utah-2.0-cs)
id AA19059; Wed, 28 Sep 88 13:34:55 MDT
Received: by defun.utah.edu (5.54/utah-2.0-leaf)
id AA06808; Wed, 28 Sep 88 13:34:53 MDT
From: sandra%defun@cs.utah.edu (Sandra J Loosemore)
Message-Id: <8809281934.AA06808@defun.utah.edu>
Date: Wed, 28 Sep 88 13:34:51 MDT
Subject: comments on character proposal
To: cl-characters@sail.stanford.edu
The only thing I really found confusing about this proposal was the
elimination of the semi-standard characters and section 2.2.3 on
non-standard characters. With that gone, it is not left entirely
clear that an implementation may support other named characters
besides #\space and #\newline, until the reader gets to chapter 14 and
the description of CHAR-NAME, where we are told that all non-graphic
characters have names. I really think that chapter 2 ought to include
something to the effect that an implementation can support named
characters that are not in the standard character set.
On the whole, the proposal looks pretty good.
-Sandra
-------
∂29-Sep-88 1506 CL-Characters-mailer char-name
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 29 Sep 88 15:06:35 PDT
Date: Thu, 29 Sep 88 12:54:20 PDT
From: Thom Linden <baggins@ibm.com>
To: "Sandra J Loosemore" <sandra%defun@cs.utah.edu>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880929.125420.baggins@IBM.com>
Subject: char-name
>The only thing I really found confusing about this proposal was the
>elimination of the semi-standard characters and section 2.2.3 on
>non-standard characters. With that gone, it is not left entirely
>clear that an implementation may support other named characters
>besides #\space and #\newline, until the reader gets to chapter 14 and
>the description of CHAR-NAME, where we are told that all non-graphic
>characters have names. I really think that chapter 2 ought to include
>something to the effect that an implementation can support named
>characters that are not in the standard character set.
Thanks for the comment. I agree this would be a good addition.
Regards,
Thom
∂29-Sep-88 1507 CL-Characters-mailer character proposal comments
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 29 Sep 88 15:06:51 PDT
Date: Thu, 29 Sep 88 14:06:02 PDT
From: Thom Linden <baggins@ibm.com>
To: Dave Unietis <dru@lucid.com>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880929.140602.baggins@IBM.com>
Subject: character proposal comments
David,
Thanks for your review and comments.
Our upcomming Monday discussion (and I imagine in the J13 mtg) we
will definitely cover the simple-string and equivalency topics. We
don't want different semantics between ISO and ANSI for
simple-strings so this item must be resolved. As for
equivalency, I favor inclusion of static equivalence classes
but consider this orthogonal to the rest of the proposal and
therefore might be omitted from ANSI if we can't come to agreement.
(ps. I believe the ISO committee on character sets is working toward
some universal repetoire/encoding but I don't have any firm info)
Your contribution of experience with an implementation will be
quite helpful in resolving these items.
I'm afraid I don't have any comments/documents from JEIDA, I imagine
they are being drafted, as we (net)speak, for the ISO meeting in Nov.
I believe one of the important points may be inclusion of static
equivalency class.
Regards,
Thom
=========================================================================
Received: from lucid.com by IBM.COM on 09/29/88 at 12:56:27 PDT
Received: from jack-jr ([192.9.200.25]) by heavens-gate.lucid.com id AA01768g; Thu, 29 Sep 88 11:53:34 PST
Received: by jack-jr id AA02812g; Thu, 29 Sep 88 12:52:13 PDT
Date: Thu, 29 Sep 88 12:52:13 PDT
From: Dave Unietis <dru@lucid.com>
Message-Id: <8809291952.AA02812@jack-jr>
To: baggins@ibm.com
In-Reply-To: Thom Linden's message of Fri, 16 Sep 88 17:03:24 PDT <880916.170324.baggins@IBM.com>
Subject: cs proposal
I received the latest draft of the character set proposal, and it seems
to adequately cover most of the issues raised by my earlier comments. The
issue I brought up concerning the type definition of most-general-string
was entirely my fault - I misread the type definition of string in the latest
draft. Defining the string type as a disjunction of other types solves
the problem satisfactorily.
I have a few remaining comments on the issues below:
* Simple-strings and SCHAR
We have no direct user experience to report here, but rather are basing our
opinion on the original JEIDA proposal as well as discussions with IBM Japan
and CSK, all of whom strongly desire compatible string access.
Furthermore, we've done some measurements of our prototype Kanji
implementations that treat SCHAR in this manner, and they indicate that the
performance impact is fairly small. Of course, this experience is only
relevant to general-purpose architectures, it may be more difficult and/or
expensive to re-implement SCHAR this way on microcoded Lisp machines - I
wonder how much influence this contingent has had on the discussion...
* Equivalence classes
To me, it seems unrealistic to expect ISO to standardize on a non-overlapping
character set, when all existing Kanji character sets (at least, all I know
about) contain a 'double-byte' version of either ASCII or EBCDIC embedded in
them.
* JEIDA
I'm concerned that their input may be arriving too late, especially if adopting
their recommendations would result in substantial revisions. The message you
forwarded from Professor Ito suggests that they do have significant comments.
At very least, I feel we need to set aside part of the Monday meeting to a
review of their suggestions. If it is possible for you to get the meeting
attendees a copy in advance, it would be helpful.
Overall, the proposal is looking quite good.
- David
∂30-Sep-88 0013 CL-Characters-mailer character proposal comments
Received: from Riverside.SCRC.Symbolics.COM (SCRC-RIVERSIDE.ARPA) by SAIL.Stanford.EDU with TCP; 30 Sep 88 00:12:57 PDT
Received: from F.ILA.Dialnet.Symbolics.COM (FUJI.ILA.Dialnet.Symbolics.COM) by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 284704; 30 Sep 88 03:09:34 EDT
Received: from CALVARY.ILA.Dialnet.Symbolics.COM by F.ILA.Dialnet.Symbolics.COM via CHAOS with CHAOS-MAIL id 1089; Fri 30-Sep-88 02:08:02 EDT
Date: Fri, 30 Sep 88 02:08 EDT
From: RWK@FUJI.ILA.Dialnet.Symbolics.COM
Sender: MAS-B@FUJI.ILA.Dialnet.Symbolics.COM
Subject: character proposal comments
To: Dave Unietis <dru@lucid.com>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
In-Reply-To: <880929.140602.baggins@IBM.com>
Message-ID: <19880930060800.4.MAS-B@CALVARY.ILA.Dialnet.Symbolics.COM>
Date: Thu, 29 Sep 88 12:52:13 PDT
From: Dave Unietis <dru@lucid.com>
* Simple-strings and SCHAR
We have no direct user experience to report here, but rather are basing our
opinion on the original JEIDA proposal as well as discussions with IBM Japan
and CSK, all of whom strongly desire compatible string access.
Furthermore, we've done some measurements of our prototype Kanji
implementations that treat SCHAR in this manner, and they indicate that the
performance impact is fairly small. Of course, this experience is only
relevant to general-purpose architectures, it may be more difficult and/or
expensive to re-implement SCHAR this way on microcoded Lisp machines - I
wonder how much influence this contingent has had on the discussion...
Simple answer: no influence at all. SCHAR is exactly the same as CHAR
which is exactly the same as AREF for all microcoded implementations
which I am aware of. (I don't know what Xerox does, but I think I have
all the others covered). The whole purpose of SCHAR is to satisfy the
requirements of the so-called "general-purpose" architectures. (Really
now, wouldn't it be more accurate to call these specialized for non-lisp?)
* Equivalence classes
To me, it seems unrealistic to expect ISO to standardize on a non-overlapping
character set, when all existing Kanji character sets (at least, all I know
about) contain a 'double-byte' version of either ASCII or EBCDIC embedded in
them.
JIS does not have a second version of ASCII, but it does have second
versions of a great many ASCII symbols.
There is some question as to whether the embedded romaji characters
(i.e. "english letters") in JIS character set are the same characters
semantically as the ASCII characters, or are special symbols. Let me
list some of the confusing aspects:
1) ISO provides for switching between JIS and other language character
sets. Why do the JIS embedded romaji exist?
a) So you can use the JIS characters without ISO? This would imply
they mean the same.
b) So there are separate characters which can used for special purposes
inside Kanji text. (Foreign words would normally be rendered in Katakana).
c) To indicate that the characters should be displayed in the same size
square as the Kanji. If they're otherwise the same characters, this would
argue for overlapping sets.
I suspect it would be helpful to know the real answer to this one.
2) Existing practice is inconsistant.
a) The Kanji Macintosh software I have seen is pretty pathetic; applications
suffer from "two byte disease". I cannot yet comment coherently on this
one because the other poor qualities of MacKanji. (Carl Hoffman tells me
better software is available).
b) Japanese word processors I have seen vary in their handling of romaji. I
have seen them treated as just a spacing variant, and I have seen them
treated treated very differently on input. I haven't yet found out how
they treat them in searches, which I think is the definitive test. Perhaps
after I learn more Japanese...
c) The Symbolics Japanese support provides for optional canonicalization on
input. I believe this is because of conflicting requests, although I'm not
certain.
3) The Japanese community appears divided on the issue. This may actually only
be a communication problem, but I have been told both things: that they are
separate characters not considered to have the same meaning, and they are
distinct characters. I'm not sure how to identify a definitive answer to this.
I am sure much of my confusion comes from dealing with individual people and/or
organizations.
That all said, I can tell you that my bias is to treat them as having
the same meaning, and just a different typeface.
∂30-Sep-88 1235 CL-Characters-mailer character proposal comments
Received: from lucid.com by SAIL.Stanford.EDU with TCP; 30 Sep 88 12:35:40 PDT
Received: from jack-jr ([192.9.200.25]) by heavens-gate.lucid.com id AA00787g; Fri, 30 Sep 88 11:33:29 PST
Received: by jack-jr id AA05853g; Fri, 30 Sep 88 12:31:20 PDT
Date: Fri, 30 Sep 88 12:31:20 PDT
From: Dave Unietis <dru@lucid.com>
Message-Id: <8809301931.AA05853@jack-jr>
To: RWK@FUJI.ILA.Dialnet.Symbolics.COM
Cc: cl-characters@sail.stanford.edu
In-Reply-To: RWK@FUJI.ILA.Dialnet.Symbolics.COM's message of Fri, 30 Sep 88 02:08 EDT <19880930060800.4.MAS-B@CALVARY.ILA.Dialnet.Symbolics.COM>
Subject: character proposal comments
Date: Fri, 30 Sep 88 02:08 EDT
From: RWK@FUJI.ILA.Dialnet.Symbolics.COM
* Equivalence classes
There is some question as to whether the embedded romaji characters
(i.e. "english letters") in JIS character set are the same characters
semantically as the ASCII characters, or are special symbols. Let me
list some of the confusing aspects:
1) ISO provides for switching between JIS and other language character
sets. Why do the JIS embedded romaji exist?
a) So you can use the JIS characters without ISO? This would imply
they mean the same.
b) So there are separate characters which can used for special purposes
inside Kanji text. (Foreign words would normally be rendered in
Katakana).
c) To indicate that the characters should be displayed in the same size
square as the Kanji. If they're otherwise the same characters, this
would argue for overlapping sets.
I suspect it would be helpful to know the real answer to this one.
I'm not sure a "real answer" exists, but I think one reason romaji came into
existence was to allow English letters and symbols to be input easily without
constantly shifting in and out of some special keyboard mode. Also, the
resulting combined Kanji/romaji data could be more easily formatted into
columns, tables and the like because the characters are all fixed-width.
Regardless of how important either one of these may be in the future, there
appears to be a large amount of existing programs and data that depend on
the "double-square" display characteristic of romaji.
3) The Japanese community appears divided on the issue. This may actually
only be a communication problem, but I have been told both things: that
they are separate characters not considered to have the same meaning,
and they are distinct characters. I'm not sure how to identify a
definitive answer to this. I am sure much of my confusion comes from
dealing with individual people and/or organizations.
That all said, I can tell you that my bias is to treat them as having
the same meaning, and just a different typeface.
I also have received much conflicting information on this issue, but the
consensus seems to be that when romaji characters are treated "syntactically"
(whatever that means), they should be considered equivalent to the
corresponding ASCII, but when treated as data, they should be processed
transparently. I'm sure much of the confusion stems from the fact that in
Lisp this distinction is quite difficult to make.
2) Existing practice is inconsistant.
a) The Kanji Macintosh software I have seen is pretty pathetic;
applications suffer from "two byte disease". I cannot yet comment
coherently on this one because the other poor qualities of MacKanji.
(Carl Hoffman tells me better software is available).
b) Japanese word processors I have seen vary in their handling of romaji.
I have seen them treated as just a spacing variant, and I have seen
them treated treated very differently on input. I haven't yet found
out how they treat them in searches, which I think is the definitive
test. Perhaps after I learn more Japanese...
c) The Symbolics Japanese support provides for optional canonicalization
on input. I believe this is because of conflicting requests, although
I'm not certain.
Under Linden's equivalence class proposal, whenever a romaji character is read
in "non-escape" mode, such as when reading a left parenthesis to start a
list, or when reading a symbol, the character is first canonicalized to its
ASCII equivalent, and then processed. Thus '( ' is converted to '(' and thus
"inherits" its syntax, and 'a', 'A', 'a ' and 'A ' are all converted to 'A'
in symbols. When in escape mode, such as when reading strings, romaji
characters are left unchanged. If the equivalence class is defined as a
static rather than rebindable property of the character set, problems such
as uncertain symbol-EQness are avoided.
I'm sure this won't make everyone happy, but it seems to come the closest of
the proposals I've heard so far. On the other hand, given the confusion
surrounding this issue, perhaps waiting for JEIDA's recommendation is the
right thing to do, if it can be obtained prior to the Oct. meeting.
Dave
∂30-Sep-88 1244 CL-Characters-mailer character proposal comments
Received: from lucid.com by SAIL.Stanford.EDU with TCP; 30 Sep 88 12:44:22 PDT
Received: from jack-jr ([192.9.200.25]) by heavens-gate.lucid.com id AA00804g; Fri, 30 Sep 88 11:42:14 PST
Received: by jack-jr id AA05874g; Fri, 30 Sep 88 12:40:52 PDT
Date: Fri, 30 Sep 88 12:40:52 PDT
From: Dave Unietis <dru@lucid.com>
Message-Id: <8809301940.AA05874@jack-jr>
To: cl-characters@sail.stanford.edu
Subject: character proposal comments
Date: Fri, 30 Sep 88 02:08 EDT
From: RWK@FUJI.ILA.Dialnet.Symbolics.COM
* Equivalence classes
There is some question as to whether the embedded romaji characters
(i.e. "english letters") in JIS character set are the same characters
semantically as the ASCII characters, or are special symbols. Let me
list some of the confusing aspects:
1) ISO provides for switching between JIS and other language character
sets. Why do the JIS embedded romaji exist?
a) So you can use the JIS characters without ISO? This would imply
they mean the same.
b) So there are separate characters which can used for special purposes
inside Kanji text. (Foreign words would normally be rendered in
Katakana).
c) To indicate that the characters should be displayed in the same size
square as the Kanji. If they're otherwise the same characters, this
would argue for overlapping sets.
I suspect it would be helpful to know the real answer to this one.
I'm not sure a "real answer" exists, but I think one reason romaji came into
existence was to allow English letters and symbols to be input easily without
constantly shifting in and out of some special keyboard mode. Also, the
resulting combined Kanji/romaji data could be more easily formatted into
columns, tables and the like because the characters are all fixed-width.
Regardless of how important either one of these may be in the future, there
appears to be a large amount of existing programs and data that depend on
the "double-square" display characteristic of romaji.
3) The Japanese community appears divided on the issue. This may actually
only be a communication problem, but I have been told both things: that
they are separate characters not considered to have the same meaning,
and they are distinct characters. I'm not sure how to identify a
definitive answer to this. I am sure much of my confusion comes from
dealing with individual people and/or organizations.
That all said, I can tell you that my bias is to treat them as having
the same meaning, and just a different typeface.
I also have received much conflicting information on this issue, but the
consensus seems to be that when romaji characters are treated "syntactically"
(whatever that means), they should be considered equivalent to the
corresponding ASCII, but when treated as data, they should be processed
transparently. I'm sure much of the confusion stems from the fact that in
Lisp this distinction is quite difficult to make.
2) Existing practice is inconsistant.
a) The Kanji Macintosh software I have seen is pretty pathetic;
applications suffer from "two byte disease". I cannot yet comment
coherently on this one because the other poor qualities of MacKanji.
(Carl Hoffman tells me better software is available).
b) Japanese word processors I have seen vary in their handling of romaji.
I have seen them treated as just a spacing variant, and I have seen
them treated treated very differently on input. I haven't yet found
out how they treat them in searches, which I think is the definitive
test. Perhaps after I learn more Japanese...
c) The Symbolics Japanese support provides for optional canonicalization
on input. I believe this is because of conflicting requests, although
I'm not certain.
Under Linden's equivalence class proposal, whenever a romaji character is read
in "non-escape" mode, such as when reading a left parenthesis to start a
list, or when reading a symbol, the character is first canonicalized to its
ASCII equivalent, and then processed. Thus '( ' is converted to '(' and thus
"inherits" its syntax, and 'a', 'A', 'a ' and 'A ' are all converted to 'A'
in symbols. When in escape mode, such as when reading strings, romaji
characters are left unchanged. If the equivalence class is defined as a
static rather than rebindable property of the character set, problems such
as uncertain symbol-EQness are avoided.
I'm sure this won't make everyone happy, but it seems to come the closest of
the proposals I've heard so far. On the other hand, given the confusion
surrounding this issue, perhaps waiting for JEIDA's recommendation is the
right thing to do, if it can be obtained prior to the Oct. meeting.
Dave
∂30-Sep-88 1554 CL-Characters-mailer Re: character proposal comments
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 30 Sep 88 15:54:05 PDT
Received: from Cabernet.ms by ArpaGateway.ms ; 30 SEP 88 14:53:09 PDT
Date: 30 Sep 88 14:52 PDT
From: masinter.pa@Xerox.COM
Subject: Re: character proposal comments
In-reply-to: RWK@FUJI.ILA.Dialnet.Symbolics.COM's message of Fri, 30 Sep 88
02:08 EDT
To: RWK@FUJI.ILA.Dialnet.Symbolics.COM
cc: Dave Unietis <dru@lucid.com>, "X3J13: Character Subcommittee"
<cl-characters@sail.stanford.edu>
Message-ID: <880930-145309-1143@Xerox>
In Xerox Common Lisp / Medley, SCHAR is slower interpreted, since it
actually checks that its argument is a string. The compiled optimizer
generates the same code as AREF.
Frankly, I think SCHAR is an odd beast -- most other declarations and type
annotations in the language are done with "the" and "declare".
Maybe it would do as well to do away with SCHAR. (A purist would eliminate
them all and say "use ELT", but that's probably going too far.)
My general point is that some of the optimizations that made sense at the
time CLtL was written no longer do, and we might be able to simplify the
language rather than make it more complex.
∂03-Oct-88 1159 CL-Characters-mailer subcommittee meeting
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 3 Oct 88 11:59:37 PDT
Date: Mon, 03 Oct 88 11:36:22 PDT
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
cc: Mike Beckerle <rpk@wheaties.ai.mit.edu>
Message-ID: <881003.113622.baggins@IBM.com>
Subject: subcommittee meeting
Ok. Jan Z. has made arrangements for our subcommittee meeting
on Monday, 10 Oct. It will be at Contel from 9:30 to 5pm. I
don't have a room number so you'll have to ask for Mathis at Contel
reception. I'll be at the Holiday Inn Sunday evening.
Regards and safe travel,
Thom
∂06-Oct-88 1012 CL-Characters-mailer some comments on the proposal
Received: from decwrl.dec.com by SAIL.Stanford.EDU with TCP; 6 Oct 88 10:12:47 PDT
Received: by decwrl.dec.com (5.54.5/4.7.34)
id AA03124; Thu, 6 Oct 88 10:11:11 PDT
Date: Thu, 6 Oct 88 10:11:11 PDT
Message-Id: <8810061711.AA03124@decwrl.dec.com>
From: vanroggen%aitg.DEC@decwrl.dec.com
To: cl-characters@sail.stanford.edu
Subject: some comments on the proposal
Comments on
"DRAFT: Extensions to Common LISP"
"to Support International Character"
"Sets" dated 9 September 1988
by
Ron Brender
Digital Equipment Corporation
6 October 1988
Overall, I think the approach is excellent and provides a good
foundation for dealing with a variety of character sets in a useful
and flexible manner.
I note that I am not expert in the LISP language -- while I have
read a lot about LISP, I have never programmed in LISP.
None-the-less, the definitional approach seems quite clear and I
hope the following comments will be of use.
1 OBJECTIVES
(See Section 1.1, pp3-4; also A.13.4, pp27-28.) It might be
appropriate to note in this introduction that the objectives
intentionally exclude a variety of issues that often come under the
title "Internationalization". Such topics are things like date and
time formats, and the like. This would not be worth mentioning but
for the fact that char-upcase and char-downcase provide operations
that should be dependent on information outside of the character
set as such to perform properly. The standard example is lowercase
e-acute, which should convert to uppercase E-acute in French
French, but to uppercase E without acute in Canadian French. There
are many other examples even within the ISO Latin-1 character set.
In the absence of a more general attack on internationalization
issues (I don't recommend such an effort on the part of LISP at
this time -- wait for others to lead the way) the meaning of
char-upcase and char-downcase for characters outside of the LISP
standard character set should be explicitly specified as
implementation-defined.
!
Comments on Common LISP Character Sets Proposal Page 2
NAMING CHARACTER SETS AND REPERTOIRES 6 October 1988
2 NAMING CHARACTER SETS AND REPERTOIRES
(See Section 2.1.) I support the decision to avoid any attempt to
provide names for particular character sets and/or other
repertoires at this time. Yet, it seems clear that portability of
applications would be enhanced if there were an established
lexicon. Moreover, this same issue will surely arise in one form
or another in the context of every programming language that adds
capability for large and/or multiple character sets.
I suggest X3J13 send a request to, most likely, ISO-IEC JTC1/SC22
(Programming Languages) recommending issues such as this that
should be addressed across programming languages, probably in
conjunction with SC2 (Character Sets and Controls) and SC21 (Data
Bases).
3 EXTERNAL-WIDTH AND FORMAT-EXTERNAL-WIDTH
(See Section 2.3, p11; also A.22.3.1, p35 [A.22.3.2 is missing?].)
The external-width function is most appropriate. Further, the
sentence observing that this function "does not address the problem
of display width" should be emphasized more strongly, even in the
absence of proportional fonts.
Further, even the suggestion that the format-external-width
variable is relevant to producing columnar output for (only)
certain external code formats deserves to be stricken entirely from
this discussion. If the production of simple columnar output is
worthwhile -- and I think it is -- then I urge that X3J13 search
for a means to achieve this that is independent of artifacts of the
external representation. The suggested approach happens to
more-or-less work at the moment with many common external
representations, but this is less unlikely to continue to be true
in the future -- in particular, as the Multiple-Octet Coded
Character Set being defined by ISO-IEC JTC1/SC2/WG2 comes into use
(an ISO DP is expected out by the end of this year (1988)).
Since assisting with columnar output seems to be the sole purpose
of this variable, I recommend it be withdrawn completely from this
proposal.
∂06-Oct-88 1054 CL-Characters-mailer Re: character proposal comments
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 6 Oct 88 10:54:15 PDT
Received: from Semillon.ms by ArpaGateway.ms ; 06 OCT 88 10:41:23 PDT
Date: 6 Oct 88 10:41 PDT
From: masinter.pa@Xerox.COM
Subject: Re: character proposal comments
In-reply-to: RWK@FUJI.ILA.Dialnet.Symbolics.COM's message of Fri, 30 Sep 88
02:08 EDT
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <881006-104123-2945@Xerox>
As far as I can tell, none of the standards committees working on character
identification within either X3 or ISO are working on an encoding where
there will be more than one code for the same semantic character
identifier, as is in JIS.
While the current JIS standard has second versions of many ASCII
characters, it would seem inappropriate to bend the X3J13 standard to
support a feature which is not consistent with the other X3 or ISO
standards in preparation.
We should distinguish what might be the right technical decision for a
particular implementation from what is the right design for a national and
international standard; especially if the standard can accomodate
JIS-compatible extensions.
I've recently heard from the Xerox representative to X3L2 that they're also
working on character encoding schemes in conjunction with ISO JTC1 SC2 WG2.
I don't know the exact protocol here, but if X3J13 is to establish a
liason, should it not be with the other ANSC X3 committees?
∂06-Oct-88 1136 CL-Characters-mailer Comments on ANSI Draft
Received: from RELAY.CS.NET by SAIL.Stanford.EDU with TCP; 6 Oct 88 11:36:10 PDT
Received: from relay2.cs.net by RELAY.CS.NET id aa15490; 6 Oct 88 12:58 EDT
Received: from utokyo-relay by RELAY.CS.NET id at23549; 6 Oct 88 12:33 EDT
Received: by ccut.cc.u-tokyo.junet (5.51/6.3Junet-1.0/CSNET-JUNET)
id AA08072; Thu, 6 Oct 88 17:46:00 JST
Received: by nttlab.ntt.jp (3.2/6.2NTT.h) with TCP; Thu, 6 Oct 88 15:58:45 JST
Received: by tutics.tut.junet (ver3.3/6.2J/systemV)
id AA04832; Thu, 6 Oct 88 11:06:50 jst
Message-Id: <8810061106.AA04832@tutics.tut.junet>
Date: Thu, 6 Oct 88 11:06:50 jst
From: Taiichi Yuasa <yuasa%tutics.tut.junet@UTOKYO-RELAY.CSNET>
To: cl-characters@SAIL.STANFORD.EDU
Subject: Comments on ANSI Draft
Cc: baggins@IBM.COM, mathis@b.isi.edu
Comments on
"DRAFT: Extensions to Common LISP to Support International Character Sets"
from the Character Set Subcommittee of Japanese SC22/LISP WG.
(compiled by Taiichi Yuasa, secretary of SC22/LISP WG, 05 OCT 88)
(a) Some technical terms are not clear. For instance,
1. The notion of "character set" is not defined.
This term appears several times in the draft:
"more than one character set" (page 6, line 16)
"multiple character sets" (page 6, line 17)
Does it mean "character repertoire" or "coded character set"?
This distinction must be clear because there may be
multiple coded character sets for a single character repertoire,
which is our case in Japan.
2. It is not clear What "an implementation SUPPORTs a character set"
means.
We ourselves have discussed what "support" means but have not
found any reasonable definition yet.
3. The sentence "it must define the sets supported and their
characteristics" (page 6, line 17) is quite vague.
(b) The relation among the string types is not clear.
We guessed
SIMPLE-BASE-STRING is a subtype of BASE-STRING, and
BASE-STRING is a subtype of MOST-GENERAL-STRING,
but we are not sure.
Also, we guessed
MOST-GENERAL-STRING is identical to STRING,
but then what is the role of the name MOST-GENERAL-STRING?
We do not know whether the draft suggests the possibility that
there are some strings that are not MOST-GENERAL-STRING.
(c) The draft specification leaves too many things unspecified.
We wonder if the specification will increase the international portability
of application programs.
Most of us would like to include Kanji characters in BASE-CHARACTER but
some of us would rather like to put them in EXTENDED-CHARACTER.
We found no description on this issue in the draft.
(d) We need some mechanism for syntactic equivalency among characters, such
as the one proposed by Thom Linden. We are wondering why such an
important mechanism is not included in the draft.
∂06-Oct-88 1506 CL-Characters-mailer characters comments
Received: from ti.com by SAIL.Stanford.EDU with TCP; 6 Oct 88 15:06:00 PDT
Received: by ti.com id AA20944; Thu, 6 Oct 88 17:04:42 CDT
Received: from Kelvin by tilde id AA27159; Thu, 6 Oct 88 16:52:12 CDT
Message-Id: <2801166760-10311358@Kelvin>
Sender: GRAY@Kelvin.csc.ti.com
Date: Thu, 6 Oct 88 16:52:40 CDT
From: David N Gray <Gray@DSG.csc.ti.com>
To: CL-Characters@SAIL.Stanford.edu
Cc: Bartley@MIPS.csc.ti.com
Subject: characters comments
Following are a few things that confused me about the proposal for
"Extensions to Common Lisp to Support International Character Sets"
(dated 9/9/88):
The semi-standard characters have been deleted without any reason given.
There doesn't seem to be any way to find out which repertoire a given
character object belongs to.
There doesn't seem to be any way to construct a character object for a
particular code and repertoire.
It is not clear what the meaning of CHAR-CODE-LIMIT is now. Does the
char code include identification of the repertoire? If so, it would
seem to be of little use. If not, then wouldn't the maximum code value
be different for different repertoires? If the code size is not able to
be different for different repertoires, then I don't see how the concept
of repertoires needs to be different from the old concept of font
numbers.
Does it really help to define standard keywords :CHARACTER-SET and
:EXTERNAL-CODE-FORMAT for the OPEN function if there are no standard
values for them? Also, if these are not specified when opening for
input, instead of specifying that the default is the base character set,
should permit defaulting from what the file system knows about how the
file was written.
∂07-Oct-88 0844 CL-Characters-mailer Symbolics comments on the Characters subcommittee report
Received: from STONY-BROOK.SCRC.Symbolics.COM (SCRC-STONY-BROOK.ARPA) by SAIL.Stanford.EDU with TCP; 7 Oct 88 08:44:23 PDT
Received: from EUPHRATES.SCRC.Symbolics.COM by STONY-BROOK.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 472522; Fri 7-Oct-88 11:43:08 EDT
Date: Fri, 7 Oct 88 11:42 EDT
From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
Subject: Symbolics comments on the Characters subcommittee report
To: CL-Characters@sail.stanford.edu
Message-ID: <19881007154235.1.MOON@EUPHRATES.SCRC.Symbolics.COM>
Comments from Symbolics on "DRAFT: Extensions to Common Lisp
to Support International Character Sets", dated Sep 9, 1988
OVERALL COMMENT
In general we agree with this proposal, but there are some defects
in it that need to be remedied before it can be acceptable. The
proposal is really not ready yet for voting.
MAJOR COMMENTS
* Pages 6 and 18 call for the meaning of the STRING-CHAR type specifier
to be incompatibly changed in the name of compatibility. We oppose this.
Compatibility would be much easier to achieve by eliminating STRING-CHAR
from the language, allowing a user or an implementation to define it
with DEFTYPE to be whatever they require for compatibility. (This would
leave (DECLARE (STRING-CHAR x)) undefined, unless an implementation added
it, since there is no way for a user to add declarations.)
* Page 11 says that (write-char #\newline stream) is no longer equivalent
to (terpri stream). This directly contradicts the last paragraph of CLtL
p.22, which this proposal does not amend. We can see no justification for
this incompatible change; outputting a newline character should remain
equivalent to calling the terpri function. The fact that many external
character encoding schemes treat newline as a special case applies equally
to the newline character and the terpri function and does not justify
changing them to be non-equivalent.
* Pages 11 and 34-5: The EXTERNAL-WIDTH function and FORMAT features are
much less well thought-out than the rest of the proposal, are described in
a self-contradictory way, and are unrelated to the main topic of this
proposal. They should be removed, and proposed separately when they have
been more carefully thought out. We could offer more detailed criticisms,
but that doesn't seem useful at this time. By the way, the Cleanup
committee issue STREAM-INFO appears to cover the same ground.
* Page 21 uses a type-specifier list (character :standard) in an example
but there is no definition of what this means nor what the valid syntax is.
* Pages 6, 23, and 25 mandate that CHAR-EQUAL is unaffected by all
implementation-defined character attributes. This is not an acceptable
generalization; the effect, if any, on CHAR-EQUAL of each
implementation-defined character attribute has to be specified as part of
the definition of that attribute. Symbolics Genera, for example, has one
implementation-defined character attribute that definitely should affect
CHAR-EQUAL and another that definitely should not.
MINOR COMMENTS (not so minor that they can be ignored!)
The introduction makes no mention of extended typesetting symbols, such as
accent marks and the copyright and trademark symbols. If Lisp is to be
used for real-world applications, these are necessary.
Page 10 refers to the representation of coded character sets as keyword
symbols. Why not use CLOS objects? There might be reasons, but you should
state them. Also there should be a portable way to refer to the base
character set. In general the language representation of character sets
and of character repertoires is very poorly specified and the proposal
needs to be extended to cover this.
Pages 11, 36, 37: There are several problems with OPEN options:
The default value of the :EXTERNAL-CODE-FORMAT argument to OPEN should be
implementation-defined rather than required to be the "natural" encoding
(whatever that is). The only requirement should be that it be able to
encode the base character set. It should not be restricted from encoding
other character sets also. There should be a name for this default value,
probably :DEFAULT.
There should be a name for the "natural" encoding and there should be a
specification of the properties of the natural encoding that a programmer
can rely on. Suggestions for the name include :BASE, :NATURAL, and
:INTERCHANGE. The definition probably involves the concept of data
interchange with non-Lisp programs on the same system.
There should be names for standard encodings such as ASCII to allow
data interchange between differing systems.
There should be a defined value for the :CHARACTER-SET option that
specifies all characters that the Lisp implementation can represent. OPEN
should signal an error if this :CHARACTER-SET option is used together with
an :EXTERNAL-CODE-FORMAT option that cannot encode all the characters the
Lisp implementation can represent. Without this, there is no way to write
a correct program that stores arbitrary strings in a file.
The default value of the :ELEMENT-TYPE argument should be an
implementation-defined subtype of CHARACTER that can be a supertype of
BASE-CHARACTER, rather than specified to be exactly BASE-CHARACTER.
It's hard to understand why both :CHARACTER-SET and :ELEMENT-TYPE exist,
since they appear to control the same thing. It would be best to remove
:CHARACTER-SET and make sure that type-specifiers are expressive enough
to allow :ELEMENT-TYPE to do everything that :CHARACTER-SET could do.
The only justification for a separate :CHARACTER-SET option that can be
inferred from the proposal is that :EXTERNAL-CODE-FORMAT :SHIFT-DELIMITED
needs an -ordered- pair of character sets; this would be more appropriately
specified as a list :EXTERNAL-CODE-FORMAT (:SHIFT-DELIMITED cs1 cs2).
The guarantee on page 11 that input operations will never return characters
outside the character sets mentioned in the :CHARACTER-SET option should
be removed. It seems wrong to require more checking in input functions
than in output functions. The :EXTERNAL-CODE-FORMAT might be capable
of representing more characters than the :CHARACTER-SET option specifies.
Are the external code format names listed on page 37 a proposal for
standardized names, or merely illustrative examples?
The motivations for the above comments are:
- provide standard names for all portable concepts
- allow, but not require, implementations to make it easy to write
programs that work with multiple character sets without special effort
- put the specification of the internal representation of characters
in one and only one place in the options to OPEN
- put the specification of the external representation of characters
in one and only one place in the options to OPEN
Page 16 (referring to paragraph 6) implies that Space is not a graphic
character, but page 24 (referring to paragraph 6) implies that Space is
a graphic character. CLtL p.235 says Space is graphic, let's stick with
that.
Pages 19 and 20 introduce a new type named simple-base-string, in addition
to simple-string. If you think about how simple-string would be used for
compiler optimization, it makes sense for simple-string to be the name for
the single simplest representation, rather than a name for a whole family
of representations that would have to be discriminated at run time. Thus
what you call simple-base-string should be called simple-string, and what
you call simple-string should just be called (simple-array character (*)).
This would not be an incompatible change in the meaning of simple-string.
Simple-string would be analogous to simple-vector.
Page 20 proposes to change (COERCE <integer> 'CHARACTER) incompatibly to be
synonymous with CODE-CHAR instead of INT-CHAR. This change seems
unmotivated. We would rather delete coercion from integers to characters
entirely, for the same reason that coercion from characters to integers is
not permitted.
Page 23 proposes an equivalence of CHAR-INT and CHAR-CODE, and of INT-CHAR
and CODE-CHAR. This is unnecessary and should be removed.
The last bullet on page 23 should be removed. Part of the definition of
each implementation-defined character attribute must be whether or not that
attribute is removed from symbol names by READ. Also the phrase "symbol
construction" is ambiguous (does it mean READ or INTERN or MAKE-SYMBOL?)
and should be avoided.
Page 30 (referring to paragraph 24) and page 31 (referring to paragraph 2)
amend MAKE-SEQUENCE and MAKE-STRING. There are several problems: It fails
to make (MAKE-SEQUENCE 'STRING n) equivalent to (MAKE-STRING n), including
handling of the presence or absence of the :INITIAL-ELEMENT option. It
fails to specify the default for the :ELEMENT-TYPE argument to MAKE-STRING.
Earlier there was much controversy about whether by default strings should
be base or extended, so it's really unfortunate that the proposal fails to
take any stand on this issue. We propose that (MAKE-STRING n) and
(MAKE-SEQUENCE 'STRING n) return a base-string by default. When the
:INITIAL-ELEMENT option is specified, they return the most specialized
type that can accomodate that character.
EDITORIAL COMMENTS
Shouldn't there be a reference to relevant ISO document(s) in the
bibliography?
The format of the later portion of the proposal, referring to locations
in CLtL by numbering paragraphs, is hard to follow. It would help to
mention a page number and a function name. In general, it is preferable
to propose what the Common Lisp language should be rather than to propose
how Guy Steele's book should be altered.
The page 14 description of the standard character subrepertoire needs an
example. There is an obvious candidate, namely $. The ISO character #o044
is a currency sign. Many ASCII terminals overseas have a glyph other than
dollar sign for this (e.g. Pound Sterling or Yen).
Page 15's table appears to contain some typographical errors (LV22, LX22,
the glyph for capital J is K) so we don't trust the table at all. Also,
what are these IDs? They don't appear anywhere else in the proposal.